2022 Volume 36 Issue 2 Published: 25 March 2022
  

  • Select all
    |
    Survey
  • Survey
    HU Han, LIU Pengyuan
    2022, 36(2): 1-11.
    Abstract ( ) PDF ( ) Knowledge map Save
    As an important part of constructing structured knowledge, relation classification has attracted much attention in the field of natural language processing. However, in many application fields (medical and financial fields), it is very difficult to collect sufficient data for training relation classification model. In recent years, few-shot learning research which relies only on a small number of training samples is emerging in various fields. In this paper, the recent models and methods of few-shot relation classification are systematically reviewed. According to the different measurement methods, the existing methods are divided into prototype and distributed ones. According to whether using additional information, the model is divided into two categories: pretraining and non-pretraining. In addition to the regular setting of few-shot learning, we also summarize the cross domain few-shot learning and few-few-shot learning, discuss the limitations of current few-shot relation classification methods, and analyze the technical challenges faced by cross domain few-shot models. Finally, the future development of few-shot relation classification is prospected.
  • Language Analysis and Calculation
  • Language Analysis and Calculation
    SUN Chao, QU Weiguang, WEI Tingxin, GU Yanhui, LI Bin, ZHOU Junsheng
    2022, 36(2): 12-21.
    Abstract ( ) PDF ( ) Knowledge map Save
    Serial-verb sentence is a sentence with several coordinated verbs. The grammatical structure and semantic relationship of serial-verb sentences are very complicated, which brings obstacles in its automatic recognition. This paper proposes a recognition model based on neural networks for the recognition of serial-verb sentence. This method uses rules to preprocess the corpus and then applies BERT, the multi-layer CNN and the BiLSTM model to jointly extract features for classification, and then complete the sentence recognition task. Experimental results show that our model achieves an accuracy of 92.71% and F1-value of 87.41%.
  • Language Analysis and Calculation
    MAO Dazhan, LI Huayong, SHAO Yanqiu
    2022, 36(2): 22-28.
    Abstract ( ) PDF ( ) Knowledge map Save
    To address the domain adaptation of a dependency parser with better performance in a single domain, this paper proposes a new semi-supervised method based on adversarial learning. We design a shared dual encoder structure based on adversarial learning, and introduce domain private auxiliary tasks and orthogonal constraints. At the same time, we explore a variety of pre-trained models in the cross domain dependency parsing task about the effectiveness and performance.
  • Language Analysis and Calculation
    TANG Yuling, ZHANG Yufei, YU Dong
    2022, 36(2): 29-39.
    Abstract ( ) PDF ( ) Knowledge map Save
    In this paper, an improved construction method of corpus with readability is proposed, and a large-scale Chinese sentence readability corpus is constructed. We then apply the deep learning method to the evaluation of the readability of Chinese sentence, and explores the influence of incorporating different levels of language difficulty features on the overall performance. The experimental results show that the accuracy of the absolute difficulty of sentences in this corpus reaches 78.69%, with an increase of 15% compared to the previous work.
  • Knowledge Representation and Acquisition
  • Knowledge Representation and Acquisition
    YANG Yunfei, SUI Zhifang
    2022, 36(2): 40-48.
    Abstract ( ) PDF ( ) Knowledge map Save
    With the rapid development of artificial intelligence technology and the large-scale growth of medical data resources, the knowledge graph in the medical domain has attracted more and more attention. This paper proposes and implements a multi-view, interactive visualization method and system for medical knowledge graph. The system includes the hierarchical structure visualization of medical entity classification, the semantic graph structure visualization between entities and relations, and the interactive visualization of unstructured data to structured data. The proposed method allows users to analyze and understand the structure of complex knowledge graphs more effectively, and then discover more valuable information contained in them.
  • Knowledge Representation and Acquisition
    LI Yu, ZHOU Guangyou
    2022, 36(2): 49-57.
    Abstract ( ) PDF ( ) Knowledge map Save
    Knowledge Base Question Answer (KBQA) is a natural language processing task to generate refined and accurate responses to natural language questions raised by users. Therefore, this paper treats the question intent identification as the common issue for KBQA of various domains, and the mapping between the question and the predicate of the tuple in the knowledge base as the key issue. Specifically, we combine the "gated convolution for deep semantics" and " interactive attention mechanism for shallow semantics" into a unified framework via the gated perception mechanism. Experiments on NLPCC-ICCPOL 2016 KBQA dataset show that our proposed method significantly outperforms the existing CDSSM and BDSSM. Besides, we adapt our method for a commonsense automatic question answering system via a commonsense knowledge base of astronomy.
  • Ethnic Language Processing and Cross Language Processing
  • Ethnic Language Processing and Cross Language Processing
    LI Xuanda, ZHOU Lanjiang, ZHANG Jian'an
    2022, 36(2): 58-68.
    Abstract ( ) PDF ( ) Knowledge map Save
    To construct bilingual parallel sentence pairs, this paper proposes a Chinese-Lao sentence similarity metric incorporating syntactic information. Firstly, the corresponding sentence structure of Chinese and Lao are obtained by the template proposed in this article. Secondly, the pre-trained representation of Chinese-Lao bilingual words with syntactic characteristics is mapped to a shared semantic space using a bilingual dictionary. Thirdly, the sentence representation is obtained through a Bi-directional Long Short-Term Memory (BiLSTM) network with a Self-Attention mechanism. Finally, the relative difference and relative product of the bilingual vectors are calculated and transmitted to the fully connected network layer to calculate the similarity score. Experimental results show that compared with the current mainstream research methods, the proposed method has achieved better results with limited corpus (F1=70.24%).
  • Ethnic Language Processing and Cross Language Processing
    Pema Tashi, Nima Tashi
    2022, 36(2): 69-75.
    Abstract ( ) PDF ( ) Knowledge map Save
    An automatic error checking method based on rules and statistics is proposed for automatic Tibetan text error checking . Firstly, based on the Tibetan spelling grammar, 37 types of deterministic finite automata are constructed to recognize modern Tibetan characters. Then a dictionary is employed to identify Sanskrit Tibetan. Finally, mutual information and t-test difference are used to identify true word errors including word collocation errors and grammatical errors in Tibetan texts. The test set consists of 100 news articles with 49 errors. Experiments show that the method proposed in this paper can effectively find non-character errors and true word errors, with 83.7% in recall, 70.7% in detection accuracy and 76.7% in F-measure.
  • Information Extraction and Text Mining
  • Information Extraction and Text Mining
    ZHOU Yinɡtonɡ, MENG Jian, GUO Yan, LIU Yue, HE Guanɡfu, DONG Lin, CHENG Xueqi
    2022, 36(2): 76-84.
    Abstract ( ) PDF ( ) Knowledge map Save
    The financial announcement information discloses the key data of the company's operation, involving complex financial relationships, namely multiple relationships. This paper designs TextMining, a vertical domain multivariate relationship extraction method based on dependency tree and frequent subgraph mining. Furthermore, inspired by the graph convolutional neural network, the FTA-GCN algorithm for vertical domain optimization is designed. In financial announcement dataset constructed in this paper, the algorithm can capture the multiple relationships between the common entities. Indicating that the algorithm has a good extraction performance.
  • Information Extraction and Text Mining
    GUO Lihua, LI Yang, WANG Suge, CHEN Xin, FU Yujie, PEI Wensheng
    2022, 36(2): 85-92.
    Abstract ( ) PDF ( ) Knowledge map Save
    We observe that the length of entity names in judicial case documents are longer, with strong mutual correlation. This paper proposes a name entity recognition method based on the forward maximum matching strategy and community attention mechanism (FMM-CAM). In particular, the forward maximum matching strategy captures longer matching words corresponding to each character in the legal instrument by their positions in sentences, and then assigned as one of the four tags ina community: B, M, E and S. A community self-attention mechanism is exploited to get the better word embedding by assigning different weights to the different communities. Concatenating the word embedding and char embedding by BERT and Word2Vec models as input, a bidirectional LSTM is applied to obtain the semantic representations of the sentences, which are finally optimized for the tag sequence by CRF model. The experimental results show that the proposed method can effectively determine the entity boundary of legal documents, such as the evidence name, the proof contents and the files number.
  • Information Extraction and Text Mining
    GONG Xiaokang, YING Wenhao, WANG Jun, GONG Shengrong
    2022, 36(2): 93-103.
    Abstract ( ) PDF ( ) Knowledge map Save
    Previous topic evolution tracking methods are mostly based on topic models, with defect in extracting and representing text semantics. Based on word embedding, this paper proposes a text proximity model PDRBL that combines explicit similarity and implicit similarity to determine the temporal judgment in the topic evolution process. Based on the PDRBL, this paper gives six topic evolution tenses and their judgment methods, and then the topic evolution tracking method based on PDRBL (TETP) is proposed. Experiments show that the proposed models have better or comparable performance in terms of Precision, Recall and F1 value, and can effectively capture the topic evolution path.
  • Information Extraction and Text Mining
    ZHU Nana, WANG Hang, ZHANG Jiale, SUN Yingwei
    2022, 36(2): 104-110.
    Abstract ( ) PDF ( ) Knowledge map Save
    Quantitative study on policy text is attractive in that the conclusions obtained by quantitative approaches can overcome the subjectivity and randomness of qualitative analysis approaches. Existing quantitative approaches on policy text analysis have two drawbacks. First, the data size is small due to the manually collecting of policy text. Second, the identification of policy text mainly depends on the human experience, which is obtained on biased induction on small data. To address the above issues, this paper proposed a pretrained language model approach for policy identification so that to overcome the above problems and achieve well performance on large-scale policy data set.
  • Information Retrieval and Question Answering
  • Information Retrieval and Question Answering
    HU Yue, ZHOU Guangyou
    2022, 36(2): 111-120.
    Abstract ( ) PDF ( ) Knowledge map Save
    Knowledge base question answering requires a large number of question answering pairs. To alleviate the problem of data annotation, the question generation from knowledge base has attracted the attention of researchers. This task is to use the triples of knowledge base to automatically generate the questions. To generate questions with rich and diverse information, this paper uses two encoding layers, Graph Transformer and BERT, to enhance the multi-granular semantic representation of triples to obtain background information. Experimental results on the SimpleQuestions dataset prove the effectiveness of the method.
  • Sentiment Analysis and Social Computing
  • Sentiment Analysis and Social Computing
    CHEN Xiao, WANG Jingjing, LI Shoushan, WEI Siyi, ZHANG Xiaoyu, CHEN Qiang
    2022, 36(2): 121-128.
    Abstract ( ) PDF ( ) Knowledge map Save
    Aspect sentiment classification is a fine-grained sentiment classification task in the field of sentiment analysis, which aims to judge the sentiment polarity of a certain aspect in a text. Cross-language aspect sentiment classification refers to mining and classifying aspect sentiment contained in target language text by using semantic and sentimental information provided by source language text, which is more challenging than monolingual aspect sentiment classification task. This paper proposes a multi-channel BERT model (Multi-BERT) for cross-lingual aspect sentiment classification. This approach employs different BERT models to learn the semantic features and beyond different grammatical features in source and target language text. Then, the text representation learned by multiple BERT models are interacted with each other, in order to mine more sufficient aspect sentiment information and improve the performance of cross-lingual aspect sentiment classification.
  • Sentiment Analysis and Social Computing
    XU Yuemei, SHI Lingyu, CAI Lianqiao
    2022, 36(2): 129-141.
    Abstract ( ) PDF ( ) Knowledge map Save
    In cross-lingual sentiment analysis, pre-trained Bilingual Word Embedding (BWE) dictionaries are leveraged to generate text vector representations of source and target languages. In order to obtain a qualified BWE dictionary, a novel model is proposed to utilize the affective features in source language as supervised information for word representation generation. The representations we pre-trained contain both semantic and emotional information , suitable for sentiment prediction in target language. In our cross-lingual sentiment analysis experiments, the source language is English, and the target languages include Chinese, French, German, Japanese, Korean and Thai. The results show that the accuracy of our proposed model is about 9.3% higher than Machine Translation (MT) based method, and 8.7% higher than parallel method without sentiment-aware representations. As expected, the experiments on English and German sentiment classification achieved best performance, for both languages belong to the Germanic language group and are more similar in grammar and semantics.
  • Sentiment Analysis and Social Computing
    XU Xiu, LIU Dexi
    2022, 36(2): 142-151.
    Abstract ( ) PDF ( ) Knowledge map Save
    Emotion cause detection is an important research task in the field of sentiment analysis, with the purpose to find emotional cause of the individual emotion and its change in texts. To better capture the semantic information, the context information, and the relative position information of clauses in the text, this paper proposes a Context and Position Interactive Co-attention Neural Network (CPC-ANN) to detection emotion causes. CPC-ANN learn the semantic information of different text clauses through the multi-head self-attention mechanism of Transformer. At the same time, CPC-ANN embeds the relative position information into each word of clauses to provide clues for the detection of emotional causes. The experimental results on the EMNLP2016 Chinese emotion cause detection dataset show that CPC-ANN model achieves better results than the other baseline models.
  • Sentiment Analysis and Social Computing
    GUO Hengrui, WANG Zhongqing, ZHU Qiaoming, LI Peifeng
    2022, 36(2): 152-159.
    Abstract ( ) PDF ( ) Knowledge map Save
    Event clustering on social text aims to cluster short texts according to event contents. Event clustering models can be divided into unsupervised learning or supervised learning at present. The unsupervised models suffer from poor performance, while the supervised models require lots of labeling data. To address the above issues, this paper proposes a semi-supervised incremental event clustering model SemiEC based on a small-scale annotated dataset. This model encodes the events by LSTM and calculates text similarity by a linear model. In particular, it uses the samples generated by incremental clustering to retrain the model and redistribute the uncertain samples. Experimental results show that the SemiEC model gets a better performance than the critical clustering algorithms.
  • Sentiment Analysis and Social Computing
    LI Weijiang, TANG Ming, YU Zhengtao
    2022, 36(2): 160-170.
    Abstract ( ) PDF ( ) Knowledge map Save
    Most sentiment classification models assume that the samples in each sentiment category in the dataset is balanced, which may not be true in real practices. In this paper, we propose a multi-channel bidirectional GRU sentiment classification method based on the fusion of mixed sampling and cost-sensitive learning, namely, re-balance multi-channel sampling BiGRU(RMS_BiGRU). We performs mixed resampling strategy on the datasets first. Then, according to different sampling forms we put them into different channels. Meanwhile we use a re-balance strategy in each channel to balance the contributions among new and old training samples. Our method can alleviate the dependency on negative class, and all samples in the sample space almost equally contribute to the training. The experimental results show that this method achieves better classification effect in sentiment classification of different categories.