2022 Volume 36 Issue 7 Published: 01 September 2022
  

  • Select all
    |
    Survey
  • Survey
    SHI Yuefeng, WANG Yi, ZHANG Yue
    2022, 36(7): 1-12,23.
    Abstract ( ) PDF ( ) Knowledge map Save
    The goal of argument mining task is to automatically identify and extract argumentative structure from natural language. Understanding the argumentative structure and its reasoning contributes to obtaining reasons behind claims, and argument mining has gained great attention from researchers. Deep learning based methods have been generally applied for these tasks owing to their encoding capabilities for complex structures and representation capabilities for latent features. This paper systematically reviews the deep learning methods in argument mining areas, ncluding fundamental concepts, frameworks and datasets. It also introduces how deep learning based methods are applied in different argument mining tasks. Finally, this paper concludes weaknesses of current argument mining methods and anticipates the future research trends.
  • Language Analysis and Calculation
  • Language Analysis and Calculation
    HUANG Ziyi, LI Junhui, GONG Zhengxian
    2022, 36(7): 13-23.
    Abstract ( ) PDF ( ) Knowledge map Save
    Abstract Semantic Representation Parsing aims to derive the semantic features of sentences from the given text. Constituency Parsing explores the hierarchical syntactic structure in sentences. There is strong complementarity between AMR Parsing and Constituency Parsing, the former utilizes the syntactic structure of the text, and the latter can avoid ambiguity with the help of semantic information. Therefore, this paper proposes a joint learning method to take the advantage from both tasks. Besides, in order to solve the limited data available in two tasks, this paper introduces external corpus to obtain large-scale automatic labeled AMR graph and automatic labeled syntax trees. Experiments show that this method can effectively improve the performance of AMR Parsing by 8.73 increase in F1-value on AMR2.0 and the performance of Constituency Parsing by 6.36 increase in F1-value on PTB.
  • Language Analysis and Calculation
    WANG Kai, LIU Mingtong, ZHANG Yujie, CHEN Yuanmeng, XU Jin'an, CHEN Yufeng
    2022, 36(7): 24-32.
    Abstract ( ) PDF ( ) Knowledge map Save
    The semantics of a sentence is composed of the meaning of its constituent components and the their combination. Therefore, syntax-based semantic composition serves as been an important research direction in NLP. The popular tree structure based method is difficult to be applied to large-scale data due to the dependent on the specific tree structure blocks parallel computation. In this paper, we present a joint framework for graph-based dependency parsing and semantic composition. Without relying on an external syntax parser the method applies the graph neural network for the semantic composition computation to support parallel computation. Moreover, the joint learning of two tasks enables the model to learn the syntactic structure and semantic contextual information simutaneously. Experimental results on LCQMC dataset show that the 79.54% accuracy is close to the tree-based semantics composition method, with the prediction speed increased by up to 30 times.
  • Language Analysis and Calculation
    YAN Peiyi, LI Bin, HUANG Tong, HUO Kairui, CHEN Jin, QU Weiguang
    2022, 36(7): 33-41.
    Abstract ( ) PDF ( ) Knowledge map Save
    The interrogative sentence has rich linguistic research results, such as interrogative sentence structure types, but it lacks systematic formal representation. We use Chinese Abstract Meaning Representation based on graph structure to annotate the semantic structure of Chinese interrogative sentence. A total of 2,071 sentences are selected from Penn Chinese Treebank, Chinese textbooks for elementary schools, etc. It is revealed that the interrogative focus can be represented by the interrogative concept amr-unknown and the semantic relationship. Additionally, the cause, modifier, and arg1(patient) are top-ranked in the interrogative focus, covering 26.45%, 16.74%, and 16.45%, respectively. Interrogative sentences annotation and analysis based on Abstract Meaning Representation provides a theoretical study and resources for related study in Chinese.
  • Language Analysis and Calculation
    XING Yuqing, KONG Fang
    2022, 36(7): 42-49.
    Abstract ( ) PDF ( ) Knowledge map Save
    Discourse relation recognition plays a crucial part in discourse parsing. In Chinese, the task is much more challenging due to the high proportion of implicit discourse relations without explicit connectives as inference clues. This paper proposed a multi-layer local inference method for Chinese Discourse Relation Recognition. It employs bi-directional LSTM and multi-head self-attention mechanism to encode independent arguments, and then generate interactive pair representations using soft alignment between arguments achieved with soft attention. Both independent representations and interactive representations are then combined to perform local inference. By stacking the above local inference modules in our framework, we achieve 67.0% in Macro-F1 value on CDTB corpus. Furthermore, a full automatic discourse parser is established by incorporating our trained model into an existing transition-based Chinese discourse parser, which can jointly learn the discourse relation and nuclearity.
  • Language Resources Construction
  • Language Resources Construction
    QIAN Qingqing, WANG Chengwen, XUN Endong, WANG Guirong, RAO Gaoqi
    2022, 36(7): 50-58.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper presents a Chinese Chunk-Based Dependency Grammar(CCDG). With this grammar, predicate-dominated chunks can be found within and between sentences, and default parts of sentences can be completed by the relations between chunks. This paper describes the principles of CCDG and defines the chunks and relations. We have annotated 2 199 texts, altogether 1800,000 words from encyclopedia and news texts based on the CCDG. The annotation procedure, label consistency, data distribution, and so on are described in detail. Based on current treebank, it is found that about 25% of clauses in Chinese are not self-sufficient, and about 88% of core predicates govern 1-3 subordinate components.
  • Language Resources Construction
    WANG Hongrui, YU Dong
    2022, 36(7): 59-68.
    Abstract ( ) PDF ( ) Knowledge map Save
    Moral wisdom, the ability to make moral judgments, is a unique element of human intelligence. It is an important research issue in the field of machine ethics to enable machines with moral judgment in line with social moral principle. At present, researches on moral judgment are mostly based on simple coarse-grained judgment with English background. This paper proposes a fine-grained Chinese moral semantic knowledge base for machine moral judgment. It designs a theoretical system including three parts: moral behavior classification system, moral framework representation system and moral intensity measurement system. A Chinese moral semantic knowledge base including 15,371 words is finally completed.
  • Language Resources Construction
    LI Jiangtao, RAO Gaoqi
    2022, 36(7): 69-76.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper explores the formal features in questions classification and summarized the question types in question corpus filtering. Based on a Chinese question classification corpus manually annotated, this paper has conducted experiments based on rules and statistics for Chinese question sentence classification. In the experiment, the finite state machine based on the optimized feature set can achieve a macro average F1-score of 0.94, and the random forest model reaches 0.98.
  • Ethnic Language Processing and Cross Language Processing
  • Ethnic Language Processing and Cross Language Processing
    ZHOU Maoke, LONG Congjun, ZHAO Xiaobing, LI Linxia
    2022, 36(7): 77-85,97.
    Abstract ( ) PDF ( ) Knowledge map Save
    The construction of the Tibetan Dependency Treebank is a fundamental task for subsequent technology development. This paper proposes a method for constructing a Tibetan Dependency Treebank based on treebank conversion. First, the existing Tibetan Phrase Structure Treebank is expanded. Then, treebank conversion rules are designed based on the characteristics of the Tibetan phrase structure tree and the dependency tree. Finally, the automatic conversion result is proofread manually, achieving 22,000 Tibetan dependency trees. This paper extracts 5% of the sentences in the dependency treebank, and the accuracy rate of the dependency relationship of the final sample reached 89.36%, and the head word reached 92.09%. A neural network-based dependency parsing model trained by the treebank achieves 83.62% UAS and 81.90% LAS.
  • Ethnic Language Processing and Cross Language Processing
    LIU Rui, KANG Shiyin, GAO Guanglai, LI Jingdong, BAO Feilong
    2022, 36(7): 86-97.
    Abstract ( ) PDF ( ) Knowledge map Save
    Aiming at real-time and high-fidelity Mongolian Text-to-Speech (TTS) generation, a FastSpeech2 based non-autoregressive Mongolian TTS system (short forMonTTS) is proposed. To improve the overall performance in terms of prosody naturalness and fidelity, MonTTS adopts three novel mechanisms: 1) Mongolian phoneme sequence is used to represent the Mongolian pronunciation; 2) phoneme-level variance adaptor is employed to learn the long-term prosody information; and 3) two duration aligners, i.e. Mongolian speech recognition and Mongolian autoregressive TTS based models, are used to provide the duration supervise signal. Besides, we build a large-scale Mongolian TTS corpus, named MonSpeech. The experimental results show that the MonTTS outperforms the state-of-the-art Tacotron-based Mongolian TTS and standard FastSpeech2 baseline systems significantly, with real-time rate (RTF) of 3.63× 10-3 and Mean Opinion Score (MOS) of 4.53(see https: //github.com/ttslr/MonTTS).
  • Information Extraction and Text Mining
  • Information Extraction and Text Mining
    LI Zhengguang, LIN Hongfei, SHEN Chen, XU Bo, ZHENG Wei
    2022, 36(7): 98-105.
    Abstract ( ) PDF ( ) Knowledge map Save
    Chemical-induced disease (CID) relation extraction from biomedical articles plays an important role in disease treatment and drug development. To capture the semantic information of entities in different sentences, this paper proposes a cross self-attention among title, abstract and shortest dependency paths (SDPs) to learn mutual semantic information. The proposed method enhances the semantic representation and captures the complete semantic information at the document level. The experimental results on CDR corpus show that the proposed method can promote the extraction performance of the document-level CID relations.
  • Information Extraction and Text Mining
    YIN Yajue, GAO Xiaoya, WANG Jingjing, LI Shoushan, XU Shaoyang, ZENG Yuhao
    2022, 36(7): 106-113.
    Abstract ( ) PDF ( ) Knowledge map Save
    The task of patent matching aims to determine the similarity between two patent texts. Different from the free texts, the patent text includes a variety of text blocks, such as title, abstract, statement, etc. In order to make full use of these multi-text information, this paper proposes a Multi-View Attentive Network (MVAN) learning model based on attention mechanism, so as to capture matching information from different perspectives of the patent. First, the BERT model is employed to extract each single-view matching features (title, abstract or statement) of a patent pair. Then, the attention mechanism is adopted to integrate the above-mentioned features and obtain multi-view matching features. Finally, the multi-view learning mechanism is applied to jointly learn single and multi-view matching features. Experimental results show that the performance of the proposed MVAN is better than other baseline methods on patent matching tasks.
  • Information Extraction and Text Mining
    ZHANG Zhaowu, XU Bin, GAO Kening, WANG Tongqing, ZHANG Qiaoqiao
    2022, 36(7): 114-122.
    Abstract ( ) PDF ( ) Knowledge map Save
    In the field of education, named entity recognition is widely used in Automatic machine questioning and Intelligent question answering. The traditional Chinese named entity recognition model needs to change the network structure to incorporate character and word information, which increases the complexity of the network structure. On the other hand, the data in the education field must be very accurate in the identification of entity boundaries. Traditional methods cannot incorporate location information, and the ability to identify entity boundaries is poor. In response to the above problems, this article uses an improved vector representation layer to integrate words, character, and location information in the vector representation layer, which can better define entity boundaries and improve the accuracy of entity recognition. BiGRU and CRF are used as models respectively. The sequence modeling layer and the annotation layer perform Chinese named entity recognition. This article conducted experiments on the Resume data set and the education data set (Edu), and the F1 values were 95.20% and 95.08%, respectively. The experimental results show that the method proposed in this paper improves the training speed of the model and the accuracy of entity recognition compared with the baseline model.
  • Information Extraction and Text Mining
    FANG Zhengyun, YANG Zheng, LI Limin, LI Tianjiao
    2022, 36(7): 123-131.
    Abstract ( ) PDF ( ) Knowledge map Save
    Aiming at structured scientific research project text, this paper proposes a novel two-view cross attention (TVCA) and multi-view cross attention text classification method (Multi-View Cross Attention, MVCA) based on pre-trained networks such as BERT. The MVCA method is targeted at one main important chapter (project abstract) and two chapters of the project text (research content, research purpose and meaning), extracting feature vectors containing richer semantic information through a cross-attention mechanism to further improve the performance of the classification model. Applied to the classification tasks of scientific publications and research project texts of China Southern Power Grid, the MVCA method is significantly better than the existing methods in terms of classification effect and convergence speed.
  • Information Retrieval
  • Information Retrieval
    LI Siying, SHEN Huawei, XU Bingbing, CHENG Xueqi
    2022, 36(7): 132-142.
    Abstract ( ) PDF ( ) Knowledge map Save
    Scientific impact evaluation of researchers is a long-standing challenge in the field of scientometrics. In existing scientific impact evaluation methods, only citation relationship between papers is effective as the carrier in the scientific impact diffusion process, without considering the role of researchers themselves. In this paper, a new evaluation method is proposed for evaluating researchers' scientific impact via the impact diffusion process. Citation relationship and authorship are simultaneously modeled in the proposed method. Verified in all 11 journals and 463,348 papers in the American Physical Society dataset, our proposed method is confirmed to be superior to the existing evaluation methods that only consider citation relationship.
  • Information Retrieval
    CHEN Jiwei,WANG Haitao,JIANG Ying, CHEN Xing
    2022, 36(7): 143-153.
    Abstract ( ) PDF ( ) Knowledge map Save
    Existing recommendation algorithms use item similarity or user similarity to recommend items without capturing the sequence pattern of user interaction with items. In fact, the interaction sequence between users and items contains important context information, which is of great significance for generating user interaction predictions. In this paper, a sequence recommendation algorithm based on a generative adversarial model is proposed. First, a convolutional neural network is used as a generator to capture the sequence pattern of the user interaction sequence, and then the attention mechanism is used as the discriminator of the generative adversarial networks to capture the time information of the sequence and the item content attribute information. We also use an improved time embedding method to model the time periodic change of the interaction. The generative adversarial networks simultaneously model the user's long-term preferences and short-term preferences. Experiments on the public data sets MovieLens-1M and Amazon-Beauty prove that the proposed algorithm has a significant improvement over all baseline methods according to HR@N and NDCG@N.
  • Sentiment Analysis and Social Computing
  • Sentiment Analysis and Social Computing
    ZHAO Zhiying, SHAO Xinhui, LIN Xing
    2022, 36(7): 154-163.
    Abstract ( ) PDF ( ) Knowledge map Save
    Aspect-based sentiment analysis is to automatically identify the sentiment polarity to a specific aspect given a context sentence. To fully capture the syntactic information in the sentence, this paper presents a novel GCN-aware Attention Networks (GCAN) for aspect based sentiment analysis. In our method, the aspect-specific representations are generated by combining the sequence information captured by LSTM and syntactic feature captured by the GCN. A bidirectional attention mechanism is tailored for solving the aspect with multi-words and generates the final contextual representation based on specific aspect. Compared with ASGCN model, the classification accuracy is increased by 0.34%,0.94%,1.43% and 1.23%, and the F1 by 0.53%,1.55%,1.60% and 2.54%, respectively on Twitter dataset and SemEval14/15 dataset.
  • Sentiment Analysis and Social Computing
    CAO Liuwen,ZHOU Yanyan, WU Changxing, HUANG Zhaohua
    2022, 36(7): 164-172.
    Abstract ( ) PDF ( ) Knowledge map Save
    Aspect-level sentiment classification is a popular research topic with the purpose of automatically inferring the sentiment polarities of aspects in text. With the fusion of multiple word embeddings as input, models based on deep learning achieve promising performance on this task. Instead of concatenating different word embeddings, this paper proposes a multiple word embeddings fusing framework based on mutual learning, in which general word embeddings, domain-specific word embeddings and the sentiment word embeddings are combined to boost the performance. Specifically, we first construct the main model with the fusion of these three kinds of word embeddings as input, then build three auxiliary models with each single word embeddings as input, and finally jointly train the main model and three auxiliary models in a mutual learning manner. Experiments on three widespread datasets show that the performance of the proposed model is significantly better than those of benchmark methods.