2021 Volume 35 Issue 9 Published: 30 September 2021
  

  • Select all
    |
    Survey
  • Survey
    WANG Wanzhen, RAO Yuan, WU Lianwei, LI Xue
    2021, 35(9): 1-14.
    Abstract ( ) PDF ( ) Knowledge map Save
    Artificial intelligence is increasingly emphasized in judicial practices in recent years. Based on the literature on intelligent models for assisting judicial cases, this paper suggests the following six challenges in legal judgement decision prediction: multi-feature crime prediction, multi-label crime prediction, multiple sub-task processing, unbalanced data issue, the interpretability of decision prediction and the adaption of existing algorithms to different types of cases. Meanwhile, the paper provides theoretical discussion, technical analysis, technical challenges as well as trend analysis for these problems. The datasets used in this field and the corresponding evaluation metrics are also summarized.
  • Survey
    YUE Zengying, YE Xia, LIU Ruiheng
    2021, 35(9): 15-29.
    Abstract ( ) PDF ( ) Knowledge map Save
    Pre-training technology has stepped into the center stage of natural language processing, especially with the emergence of ELMo, GTP, BERT, XLNet, T5, and GTP-3 in the last two years. In this paper, we analyze and classify the existing pre-training technologies from four aspects: language model, feature extractor, contextual representation, and word representation. We discuss the main issues and development trends of pre-training technologies in current natural language processing.
  • Survey
    DENG Yiyi, WU Changxing, WEI Yongfeng, WAN Zhongbao, HUANG Zhaohua
    2021, 35(9): 30-45.
    Abstract ( ) PDF ( ) Knowledge map Save
    Named entity recognition (NER), as one of the basic tasks in natural language processing, aims to identify the required entities and their types in unstructured text. In recent years, various named entity recognition methods based on deep learning have achieved much better performance than that of traditional methods based on manual features. This paper summarizes recent named entity recognition methods from the following three aspects: 1) A general framework is introduced, which consists of an input layer, an encoding layer and a decoding layer. 2) After analyzing the characteristics of Chinese named entity recognition, this paper introduces Chinese NER models which incorporate both character-level and word-level information. 3) The methods for low-resource named entity recognition are described, including cross-lingual transfer methods, cross-domain transfer methods, cross-task transfer methods, and methods incorporating automatically labeled data. Finally, the conclusions and possible research directions are given.
  • Machine Translation
  • Machine Translation
    YE Na, LI Tianyu, CAI Dongfeng, XU Jia
    2021, 35(9): 46-57.
    Abstract ( ) PDF ( ) Knowledge map Save
    Translation quality estimation (QE) technology refers to evaluating machine translation results without reference translations. Current neural translation quality estimation models can implicitly learn the syntactic structure of the source language, but they cannot effectively capture the syntactic relationships within sentences from the perspective of linguistics. This paper proposes a method to integrate the syntactic relationship information of the source sentence into neural translation quality estimation, jointly considering the internal dependency relationships of the source language and the translation quality. Experimental results show that the syntactic feature can improve the performance of the model. Finally, we used an ensemble learning algorithm to integrate multiple other linguistic features to obtain the best performance.
  • Machine Translation
    GUO Junjun, TIAN Yingfei, YU Zhengtao, GAO Shengxiang, YAN Wanying
    2021, 35(9): 58-65.
    Abstract ( ) PDF ( ) Knowledge map Save
    Pseudo-parallel sentence pair extraction is a key method to improve the performance of low-resource machine translation such Chinese -Vietnamese. Existing methods based on deep learning framework do not consider the difficulty of semantic representation of different words, which leads to insufficient semantic information of sentences, low quality of extracted sentences and high noise. To solve this problem, this paper proposes a semantic representation network framework of bidirectional LSTM plus semantic adaptive coding. The specific idea is to encode Chinese and Vietnamese sentences first, and adaptive representation is carried out to deeply mine the semantic information of different words in the sentence to realize the depth representation of Chinese and Vietnamese sentences. Then the vector of depth representation is mapped to a unified common semantic space to maximize the semantic similarity between the sentences for higher quality Chinese-Vietnamese pseudo-parallel sentences. The experimental results show that the model improves F1 score by 5.09%, which is better than the baseline model.
  • Ethnic Language Processing and Cross Language Processing
  • Ethnic Language Processing and Cross Language Processing
    YANG Feiyang, CUI Rongyi, ZHAO Yahui, JIN Jing, LI Feiyu
    2021, 35(9): 66-74.
    Abstract ( ) PDF ( ) Knowledge map Save
    A Hierarchically Structured Korean(HS-K) is proposed in this article to construct an effective Korean representation by combining deep reinforcement learning with Self-Attention mechanism. Applying the Actor-Critic approach in reinforcement learning, the model takes the text classification effect as the label feedback of reinforcement learning, and treats the prasing task as the sequence decision task. The experimental results show that the model can identify the key syntactic structure of Korean, comparable to manual tagging.
  • Ethnic Language Processing and Cross Language Processing
    HAO Yongbin, ZHOU Lanjiang, LIU Chang
    2021, 35(9): 75-81.
    Abstract ( ) PDF ( ) Knowledge map Save
    Laotian is a non-space separated alphabetic language. The existing segmentation algorithms for Laotian mainly use rules to segment syllables first, and then segment words according to the results of syllable segmentation. This paper proposes an end-to-end Laotian word segmentation method based on neural networks. With multi-task joint learning, the Lao syllable segmentation and word segmentation are jointly processed via BiLSTM. Experiments show that the precision of the proposed method reaches 89.02%, out-performing previous word segmentation models.
  • Information Extraction and Text Mining
  • Information Extraction and Text Mining
    WANG Xiaoxu, LIU Xiaoxia
    2021, 35(9): 82-93.
    Abstract ( ) PDF ( ) Knowledge map Save
    Protein complexes are significant in understand cell organization and function, and to identify complex from protein-protein interaction (PPI) network by computational method is one of the hot research topics. To overcome the noise issue in PPI network, this paper proposes a protein complex identification algorithm (NOBEL) via supervised learning based on topological information of protein complex. Firstly, NOBEL construct a weighted PPI network based on proteins biological information and topological information, so as to reduce the noise problem in the network. Then, complex topological information is extracted as features for the supervised model through weighted and unweighted PPI network. Finally, the trained model is applied to identify protein complexes from PPI networks. Experiments on four real PPI networks show that, compared with the other seven protein complexes identification algorithms, NOBEL improves F-measure by at least 4.39% on Gavin, 1.32% on DIP, 2.39% on WI-PHI_core and 2.34% on WI-PHI_extend, respectively.
  • Information Extraction and Text Mining
    YANG Siqin, ZHANG Xiaochen, JIANG Minghu
    2021, 35(9): 94-101.
    Abstract ( ) PDF ( ) Knowledge map Save
    In this study, positive, neutral and negative facial emotions were presented to subjects. The N170, an event-related potential (ERP) component aroused around the temporal occipital and related to the perception of facial emotions was recorded in order to explore the effect of reading literary fiction on facial emotion recognition. The reading group read literary fiction between two sections of facial emotion recognition tests, while the control group did not. Compared with the first section, the amplitude of N170 in the second section increased. However, reading process appeared to suppress the amplitude growth of N170. Moreover, the more positive the facial emotion of the stimuli was, the more the suppression effect was. Based on these evidences, reading indeed affects facial emotion recognition. The current study speculated that reading may inhibit facial emotion specificity in the brain, thereby possibly improve the perception of facial emotions.
  • Information Retrieval and Question Answering
  • Information Retrieval and Question Answering
    ZENG Lanjun, PENG Minlong, LIU Yaqi, XU Liaosa, WEI Zhongyu, HUANG Xuanjing
    2021, 35(9): 102-112.
    Abstract ( ) PDF ( ) Knowledge map Save
    Hashtag recommendation has received considerable attention in recent years. Most existing deep learning methods formulate this task as a multi-class classification problem to categorize tweets into a fixed number of target classes. However, as new hashtags are continuously introduced by users with daily bursts of news, these methods fail to tackle new hashtags without retraining. To solve this problem, we proposed to convert hashtag recommendation task to a few-shot learning problem. In addition, we combined users’ preference for hashtag usage to reduce the complexity of recommendation algorithm. Experimental results on the real-world dataset demonstrate that our method achieves significant performance improvement over the state-of-the-art methods and is more robust.
  • Information Retrieval and Question Answering
    WU Kun, ZHOU Xiabing, LI Zhenghua, LIANG Xingwei, CHEN Wenliang
    2021, 35(9): 113-122.
    Abstract ( ) PDF ( ) Knowledge map Save
    Path selection, as a key step in the Knowledge Base Question Answering (KBQA) task, relies on the the semantic similarity between a question and candidate paths. To deal with massive unseen relations in the test set, a method based on dynamic sampling of negative examples is proposed to enrich the relations in the training set. In the prediction phase, two path pruning methods, i.e., the classification method and the beam search method, are compared to tackle the explosion of candidate paths. On the CCKS 2019-CKBQA evaluation data set containing simple and complex problems, the proposed method achieves an average F1 value of 0.694 for the single-model system, and 0.731 for the ensemble system.
  • Information Retrieval and Question Answering
    YUAN Hao, WANG Yong
    2021, 35(9): 123-131.
    Abstract ( ) PDF ( ) Knowledge map Save
    The Conditional Variational Autoencoder (CVAE)is applied in multi-turn dialogues to improve the diversity of the responses. Most CVAE based models fail to capture long dependencies in context. Meanwhile, the existed methods cannot explicitly deal with the difference between context utterance and source utterances. This paper combines Transformer with CVAE to capture the long dependencies in the dialogue. By separating the encoding of the context utterances, the information of the context is directed to the source utterances, with the gated structures controlling information fusion between context utterances and source utterances. Experiments show that the proposed model has higher response diversity and better quality.
  • Natural Language Understanding and Generation
  • Natural Language Understanding and Generation
    ZHANG Hu, ZHANG Ying, YANG Zhizhuo, QIAN Yili, LI Ru
    2021, 35(9): 132-140.
    Abstract ( ) PDF ( ) Knowledge map Save
    Automatic answering of reading comprehension in college entrance examination is a challenge in the machine reading comprehension task. At present, the number available question-answering pairs in Chinese reading comprehension of the college entrance examination is limited, and deep learning method is obstructed by the small scale of the experimental data. This paper propose to adapt the traditional EDA data enhancement is to the reading comprehension in college entrance examination. To deal with the long contexts in reading materials, a dynamic material clipping method based on sliding window is proposed. And a method for evaluating the quality of sentences in the reading material is designed on similarity calculation. The experimental results show that all three strategies can improve the automatic answering in reading comprehension of college entrance examination questions, with 5% or more increase in accuracy.