Journal of Chinese Information Processing

Select

Survey

Progress of Judicial Judgment Prediction Based on Artificial Intelligence

WANG Wanzhen, RAO Yuan, WU Lianwei, LI Xue

2021, 35(9): 1-14.

Abstract ( ) PDF ( )

Knowledge map

Save

Artificial intelligence is increasingly emphasized in judicial practices in recent years. Based on the literature on intelligent models for assisting judicial cases, this paper suggests the following six challenges in legal judgement decision prediction: multi-feature crime prediction, multi-label crime prediction, multiple sub-task processing, unbalanced data issue, the interpretability of decision prediction and the adaption of existing algorithms to different types of cases. Meanwhile, the paper provides theoretical discussion, technical analysis, technical challenges as well as trend analysis for these problems. The datasets used in this field and the corresponding evaluation metrics are also summarized.

Select

Survey

A Survey of Language Model Based Pre-training Technology

YUE Zengying, YE Xia, LIU Ruiheng

2021, 35(9): 15-29.

Abstract ( ) PDF ( )

Knowledge map

Save

Pre-training technology has stepped into the center stage of natural language processing, especially with the emergence of ELMo, GTP, BERT, XLNet, T5, and GTP-3 in the last two years. In this paper, we analyze and classify the existing pre-training technologies from four aspects: language model, feature extractor, contextual representation, and word representation. We discuss the main issues and development trends of pre-training technologies in current natural language processing.

Select

Survey

A Survey on Named Entity Recognition Based on Deep Learning

DENG Yiyi, WU Changxing, WEI Yongfeng, WAN Zhongbao, HUANG Zhaohua

2021, 35(9): 30-45.

Abstract ( ) PDF ( )

Knowledge map

Save

Named entity recognition (NER), as one of the basic tasks in natural language processing, aims to identify the required entities and their types in unstructured text. In recent years, various named entity recognition methods based on deep learning have achieved much better performance than that of traditional methods based on manual features. This paper summarizes recent named entity recognition methods from the following three aspects: 1) A general framework is introduced, which consists of an input layer, an encoding layer and a decoding layer. 2) After analyzing the characteristics of Chinese named entity recognition, this paper introduces Chinese NER models which incorporate both character-level and word-level information. 3) The methods for low-resource named entity recognition are described, including cross-lingual transfer methods, cross-domain transfer methods, cross-task transfer methods, and methods incorporating automatically labeled data. Finally, the conclusions and possible research directions are given.

Select

Machine Translation

Dependency Relationship Enhanced Neural Machine Translation Quality Estimation

YE Na, LI Tianyu, CAI Dongfeng, XU Jia

2021, 35(9): 46-57.

Abstract ( ) PDF ( )

Knowledge map

Save

Translation quality estimation (QE) technology refers to evaluating machine translation results without reference translations. Current neural translation quality estimation models can implicitly learn the syntactic structure of the source language, but they cannot effectively capture the syntactic relationships within sentences from the perspective of linguistics. This paper proposes a method to integrate the syntactic relationship information of the source sentence into neural translation quality estimation, jointly considering the internal dependency relationships of the source language and the translation quality. Experimental results show that the syntactic feature can improve the performance of the model. Finally, we used an ensemble learning algorithm to integrate multiple other linguistic features to obtain the best performance.

Select

Machine Translation

Pseudo-Parallel Sentence Pair Extraction for Chinese-Vietnamese Based on Semantic Adaptive Coding

GUO Junjun, TIAN Yingfei, YU Zhengtao, GAO Shengxiang, YAN Wanying

2021, 35(9): 58-65.

Abstract ( ) PDF ( )

Knowledge map

Save

Pseudo-parallel sentence pair extraction is a key method to improve the performance of low-resource machine translation such Chinese -Vietnamese. Existing methods based on deep learning framework do not consider the difficulty of semantic representation of different words, which leads to insufficient semantic information of sentences, low quality of extracted sentences and high noise. To solve this problem, this paper proposes a semantic representation network framework of bidirectional LSTM plus semantic adaptive coding. The specific idea is to encode Chinese and Vietnamese sentences first, and adaptive representation is carried out to deeply mine the semantic information of different words in the sentence to realize the depth representation of Chinese and Vietnamese sentences. Then the vector of depth representation is mapped to a unified common semantic space to maximize the semantic similarity between the sentences for higher quality Chinese-Vietnamese pseudo-parallel sentences. The experimental results show that the model improves F₁ score by 5.09%, which is better than the baseline model.

Select

Ethnic Language Processing and Cross Language Processing

Key Syntatic Structure Recognition Based on Reinforcement Learning and Self-Attention for Korean

YANG Feiyang, CUI Rongyi, ZHAO Yahui, JIN Jing, LI Feiyu

2021, 35(9): 66-74.

Abstract ( ) PDF ( )

Knowledge map

Save

A Hierarchically Structured Korean(HS-K) is proposed in this article to construct an effective Korean representation by combining deep reinforcement learning with Self-Attention mechanism. Applying the Actor-Critic approach in reinforcement learning, the model takes the text classification effect as the label feedback of reinforcement learning, and treats the prasing task as the sequence decision task. The experimental results show that the model can identify the key syntactic structure of Korean, comparable to manual tagging.

Select

Ethnic Language Processing and Cross Language Processing

An End-to-end Multi Task Method for Laotian Word Segmentation via LSTM

HAO Yongbin, ZHOU Lanjiang, LIU Chang

2021, 35(9): 75-81.

Abstract ( ) PDF ( )

Knowledge map

Save

Laotian is a non-space separated alphabetic language. The existing segmentation algorithms for Laotian mainly use rules to segment syllables first, and then segment words according to the results of syllable segmentation. This paper proposes an end-to-end Laotian word segmentation method based on neural networks. With multi-task joint learning, the Lao syllable segmentation and word segmentation are jointly processed via BiLSTM. Experiments show that the precision of the proposed method reaches 89.02%, out-performing previous word segmentation models.

Select

Information Extraction and Text Mining

NOBEL: A Protein Complex Identification Method Based on Topological Information and Supervised Learning

WANG Xiaoxu, LIU Xiaoxia

2021, 35(9): 82-93.

Abstract ( ) PDF ( )

Knowledge map

Save

Protein complexes are significant in understand cell organization and function, and to identify complex from protein-protein interaction (PPI) network by computational method is one of the hot research topics. To overcome the noise issue in PPI network, this paper proposes a protein complex identification algorithm (NOBEL) via supervised learning based on topological information of protein complex. Firstly, NOBEL construct a weighted PPI network based on proteins biological information and topological information, so as to reduce the noise problem in the network. Then, complex topological information is extracted as features for the supervised model through weighted and unweighted PPI network. Finally, the trained model is applied to identify protein complexes from PPI networks. Experiments on four real PPI networks show that, compared with the other seven protein complexes identification algorithms, NOBEL improves F-measure by at least 4.39% on Gavin, 1.32% on DIP, 2.39% on WI-PHI_core and 2.34% on WI-PHI_extend, respectively.

Select

Information Extraction and Text Mining

Reading Literary Fiction Affects Face Emotion Recognition: An ERP Evidence

YANG Siqin, ZHANG Xiaochen, JIANG Minghu

2021, 35(9): 94-101.

Abstract ( ) PDF ( )

Knowledge map

Save

In this study, positive, neutral and negative facial emotions were presented to subjects. The N170, an event-related potential (ERP) component aroused around the temporal occipital and related to the perception of facial emotions was recorded in order to explore the effect of reading literary fiction on facial emotion recognition. The reading group read literary fiction between two sections of facial emotion recognition tests, while the control group did not. Compared with the first section, the amplitude of N170 in the second section increased. However, reading process appeared to suppress the amplitude growth of N170. Moreover, the more positive the facial emotion of the stimuli was, the more the suppression effect was. Based on these evidences, reading indeed affects facial emotion recognition. The current study speculated that reading may inhibit facial emotion specificity in the brain, thereby possibly improve the perception of facial emotions.

Select

Information Retrieval and Question Answering

Personalized Hashtag Recommendation Using Few-shot Learning

ZENG Lanjun, PENG Minlong, LIU Yaqi, XU Liaosa, WEI Zhongyu, HUANG Xuanjing

2021, 35(9): 102-112.

Abstract ( ) PDF ( )

Knowledge map

Save

Hashtag recommendation has received considerable attention in recent years. Most existing deep learning methods formulate this task as a multi-class classification problem to categorize tweets into a fixed number of target classes. However, as new hashtags are continuously introduced by users with daily bursts of news, these methods fail to tackle new hashtags without retraining. To solve this problem, we proposed to convert hashtag recommendation task to a few-shot learning problem. In addition, we combined users’ preference for hashtag usage to reduce the complexity of recommendation algorithm. Experimental results on the real-world dataset demonstrate that our method achieves significant performance improvement over the state-of-the-art methods and is more robust.

Select

Information Retrieval and Question Answering

Path Selection for Chinese Knowledge Base Question Answering

WU Kun, ZHOU Xiabing, LI Zhenghua, LIANG Xingwei, CHEN Wenliang

2021, 35(9): 113-122.

Abstract ( ) PDF ( )

Knowledge map

Save

Path selection, as a key step in the Knowledge Base Question Answering (KBQA) task, relies on the the semantic similarity between a question and candidate paths. To deal with massive unseen relations in the test set, a method based on dynamic sampling of negative examples is proposed to enrich the relations in the training set. In the prediction phase, two path pruning methods, i.e., the classification method and the beam search method, are compared to tackle the explosion of candidate paths. On the CCKS 2019-CKBQA evaluation data set containing simple and complex problems, the proposed method achieves an average F₁ value of 0.694 for the single-model system, and 0.731 for the ensemble system.

Select

Information Retrieval and Question Answering

SCT-CVAE: Transformer Based Dialogue Model via Separate Context and CVAE

YUAN Hao, WANG Yong

2021, 35(9): 123-131.

Abstract ( ) PDF ( )

Knowledge map

Save

The Conditional Variational Autoencoder (CVAE)is applied in multi-turn dialogues to improve the diversity of the responses. Most CVAE based models fail to capture long dependencies in context. Meanwhile, the existed methods cannot explicitly deal with the difference between context utterance and source utterances. This paper combines Transformer with CVAE to capture the long dependencies in the dialogue. By separating the encoding of the context utterances, the information of the context is directed to the source utterances, with the gated structures controlling information fusion between context utterances and source utterances. Experiments show that the proposed model has higher response diversity and better quality.

Select

Natural Language Understanding and Generation

Data Augmentation Based Automatic Answering of Reading Comprehension in College Entrance Examination

ZHANG Hu, ZHANG Ying, YANG Zhizhuo, QIAN Yili, LI Ru

2021, 35(9): 132-140.

Abstract ( ) PDF ( )

Knowledge map

Save

Automatic answering of reading comprehension in college entrance examination is a challenge in the machine reading comprehension task. At present, the number available question-answering pairs in Chinese reading comprehension of the college entrance examination is limited, and deep learning method is obstructed by the small scale of the experimental data. This paper propose to adapt the traditional EDA data enhancement is to the reading comprehension in college entrance examination. To deal with the long contexts in reading materials, a dynamic material clipping method based on sliding window is proposed. And a method for evaluating the quality of sentences in the reading material is designed on similarity calculation. The experimental results show that all three strategies can improve the automatic answering in reading comprehension of college entrance examination questions, with 5% or more increase in accuracy.

Please choose a citation manager

Content to export

2021 Volume 35 Issue 9 Published: 30 September 2021