2019 Volume 33 Issue 8 Published: 20 August 2019
  

  • Select all
    |
    Language Analysis and Calculation
  • Language Analysis and Calculation
    XIA Qiaolin, SUI Zhifang, CHANG Baobao, ZHAN Weidong, ZHANG Kunli, KE Yonghong
    2019, 33(8): 1-11.
    Abstract ( ) PDF ( ) Knowledge map Save
    The natural language understanding involves multiple categories of meaning, including propositions, modality, and temporal logic. The most popular study of shallow semantics is focused on the analysis of propositional meaning. Without supporting for conceptual meaning and deep logical meaning, it can be hardly used to assist the computer in deep understanding and reasoning of the text. Based on the theory of argument structures, event semantics, and construction grammar, this paper breaks through the limitations of shallow semantic analysis (e.g. semantic role labeling) and establishes a deep semantic representation system for concepts and logic. Based on a layered rendering annotation strategy, a large scale Chinese deep semantic annotated corpus is constructed, which also helps to verify the completeness and coverage of the description system by real practice. The establishment of this theore-tical system and the construction of language resources are expected to promote the innovative development of Chinese automatic semantic analysis technology and artificial intelligence.
  • Language Analysis and Calculation
    XU Sheng, WANG Tishuang, LI Peifeng, ZHU Qiaoming
    2019, 33(8): 12-19,35.
    Abstract ( ) PDF ( ) Knowledge map Save
    Chinese implicit discourse relation recognition is a challenging task due to the difficulty in capturing the semantic information of the argument. This paper proposes a Three-Layer Attention Network (TLAN) to simulate the human bidirectional reading strategy and repeated reading process, and then recognizes Chinese implicit discourse relations between arguments. First, two arguments are encoded by the self-attention layer. Then, the interactive attention layer is applied to simulate the bidirectional reading strategy and generate the argument representation containing interactive information, and the external memory of the argument pair will be obtained through a nonlinear transformation. Finally, an attention layer with external memory is introduced to simulate the repeated reading process to generate the final representation of the arguments. Experimental results on the CDTB show that our TLAN outperforms various strong baselines in both micro-F1 and macro-F1.
  • Language Analysis and Calculation
    GE Haizhu, KONG Fang, ZHOU Guodong
    2019, 33(8): 20-27.
    Abstract ( ) PDF ( ) Knowledge map Save
    Elementary Discourse Unit (EDU) recognition is a fundamental task of discourse analysis. This paper proposes a Chinese elementary discourse unit recognition approach based on theme-rheme theory, in which the identification of EDU is cast into the problem of theme-rheme recognition. Detecting theme and rheme can be conducted using sequence label approach, and after achieving the boundary of theme and rheme, we can merge them to get the EDU boundary. In contrast to related work on EDU recognition, our proposed approach can pay more attention on the internal structure of EDU. The experiments on the Chinese Discourse Topic Corpus (CDTC) show the effectiveness of our approach by the F1-score of 89.46%.
  • Language Analysis and Calculation
    WU Ruiying, KONG Fang
    2019, 33(8): 28-35.
    Abstract ( ) PDF ( ) Knowledge map Save
    Event coreference resolution is a challenging task in the field of natural language processing, because events contain rich information, convey in many ways, and are sparsely distributed in text. The existing researches address this task by using such heuristic information as word matching and syntactic structure via feature engineering, which fail to handle those event tasks inolving complex semantics. This paper proposes an end-to-end neural event coreference resolution model, which learns the semantic contents of the context through multiple word representation, Bi-directional Recurrent Neural Networks(Bi-RNN) and attention mechanism. Experiments on the KBP2015 and 2016 datasets show the effectiveness of our proposed approach, achieving 39.9% in F1-measure under the CoNLL evaluation standard.
  • Language Analysis and Calculation
    GE Donglai, LI Junhui, ZHU Muhua, LI Shoushan, ZHOU Guodong
    2019, 33(8): 36-45.
    Abstract ( ) PDF ( ) Knowledge map Save
    Sequence-to-sequence (seq2seq) approaches formalize AMR parsing as a translation task from a source sentence to a target AMR graph. However, previous studies generally model a source sentence as a word sequence but ignore the inherent syntactic and semantic roles information. In this paper, we propose a straightforward yet effective approach to incorporate syntactic and semantic roles information of the source sentence into seq2seq based AMR parsing. Experimental results show that our approach achieves significant improvement of 6.7% F1 score on an English benchmark dataset. Further indepth analysis from various perspectives is provided to reveal how source syntactic and semantic roles information benefits AMR parsing. Experimental analysis also reveals that POS information and segmenting words into subwords make the more contribution to the improvement, followed by other syntax and semantic roles.
  • Language Resources Construction
  • Language Resources Construction
    WANG Hongbin, FENG Yinhan, YU Zhengtao, WEN Yonghua
    2019, 33(8): 46-52.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper proposed a cross-language word embedding method based on small dictionary and unbalanced monolingual corpus. This method first normalizes monolingual word vectors, obtaining gradient descent initial values for small dictionary words by orthogonal optimal linear transformations. And then the large-scale source language (English) corpus is clustered, and the source language words corresponding to each cluster are detected via dictionary. The average word vector value of each cluster, and the word vector value corresponding to the source language and the target language are thus obtained. A new bilingual word vector correspondence relationship is established, which are extended into the small dictionary. Finally, the generalized extended dictionary is used to conduct gradient descent on the cross-language word embedding mapping model to obtain the optimal value. Experiments in English-Italian, English-German and English-Finnish show that this method can reduce the number of gradient descent iterations in cross-language word embedding and reduce the training time, preserving a good accuracy rate in cross-language word embedding.
  • Other Langage in/arourd China
  • Other Langage in/arourd China
    WANG Zhijuan, LIU Feifei, ZHAO Xiaobing, SONG Wei
    2019, 33(8): 53-59.
    Abstract ( ) PDF ( ) Knowledge map Save
    To alleviate the issue of labeling cost of training data for low resource languages, the active learning is a promising method by selecting the informative data without redundancy. Four active learning methods based on the confidence are proposed, with the parameters decided empirically. The experimental results: selecting the data with confidence below 0.7 and 6 iteration of labeling with up to 3.2MB training data, we can achieve 0.88 F-measure for Tibetan name recognition. Compare with the 10MB training data for CRF model to achieve the same performance (with no more than 0.01% difference), the active learning approach reduces the annotation scale by 66%.
  • Other Langage in/arourd China
    GULINIGEER Abudouwaili, TUERGEN Yibulayin, KAHAERJIANG Abiderexiti, WANG Lulu
    2019, 33(8): 60-66.
    Abstract ( ) PDF ( ) Knowledge map Save
    Stemming is a basic research in Uyghur Natural-language Processing (NLP), which is still challenged by issues of over-segmentation, non-segmentation and ambiguity segmentation in Uyghur stemming. This paper propose a neural network model of Bi-LSTM-CRF, which is based on bidirectional (Bi) long short-term memories (LSTMs) and conditional random fields (CRFs). It uses Uyghur character as minimum language unit to extract Uyghur character features, phonological features and phonetic features, and use them as the candidate features. The stemming result shows that an F-score of 88% for the Bi-LSTM-CRF model of Uyghur stemming, with further 1.8% increase after incorporating the manual features.
  • Informaton Extraction and Text Mining
  • Informaton Extraction and Text Mining
    ZHANG Ying, WANG Zhongqing, WANG Hongling
    2019, 33(8): 67-76.
    Abstract ( ) PDF ( ) Knowledge map Save
    The single document extractive summarization aims to extract the most relevant sentences to represent the core content of the document. To employ the satellite and nuclear relations which can represent the importance of sentences, this paper proposes a neural approach to jointly model the satellite and nuclear relations extraction and text summarization. This model considers the semantic and structural information of the text, and finally extracts the sentences with most relevant and importance to represent the core content of the document as summary. The experimental results show that the method has a significant improvement in the ROUGE evaluation index compared with the current mainstream single document extractive summarization methods.
  • Informaton Extraction and Text Mining
    WU Wentao, LI Peifeng, ZHU Qiaoming
    2019, 33(8): 77-83.
    Abstract ( ) PDF ( ) Knowledge map Save
    Entity and event extraction aim at detecting entities and events from text, respectively. Previous studies in information extraction usually took entity extraction and event extraction as two separate tasks without capturing the close relationship between the two tasks. This paper proposes a hybrid neural network to simultaneously extract the entity and the event, and exploit the dependencies between them. This network first uses encoder-decoder bidirectional LSTM module to identify entities, and then introduces the entity context information from the above bidirectional LSTM module to a neural network, which combines self-attention and gated convolution to facilitate event extraction. Experimental results on the ACE 2005 English corpus show that our model outperforms the state-of-the-art methods.
  • Informaton Extraction and Text Mining
    LI Qingqing, YANG Zhihao, LUO Ling, LIN Hongfei, WANG Jian
    2019, 33(8): 84-92.
    Abstract ( ) PDF ( ) Knowledge map Save
    Biomedical relation extraction plays an important role in biomedical text mining, which can automatically extract high-quality biomedical relationships from biomedical texts. In this paper, we apply neural network-based multi-task learning method to explore the correlation among multiple biomedical relation extraction tasks. In our study, we construct a fully-shared model (FSM) and a shared-private model (SPM) and propose an attention-based main-auxiliary model (Att-MAM). Experimental results on five public biomedical relation extraction datasets show that the multi-task learning can obtain better performance than the single task method.
  • Question Answering,Dialogue System and Machine Reading Comprehension
  • Question Answering,Dialogue System and Machine Reading Comprehension
    Dong Xiaozheng, Hong Yu, Zhu Fenhong, Yao Jianmin, Zhu Qiaoming
    2019, 33(8): 93-100.
    Abstract ( ) PDF ( ) Knowledge map Save
    The question generation task aims to automatically generate one or more questions on the condition of understanding the semantics of a declarative sentence. This paper focuses on one of the sub-tasks, Point-wise Question Generation (PQG), and proposes a seq2seq PGQ model that combines attention mechanism about tokens. Among them, the token is a general summary of the potential answers for the sentences level, which is often shown as a series of consecutive terms in a declarative sentence. In terms of method implementation, the position information of the token and the semantic information of the whole sentence are integrated in the process of encoding. While in the process of decoding, the attention of token is strengthened. The experiment is carried out on the SQuAD corpus, revealing a better performance of 1.98% improvement in BLEU-4.
  • Question Answering,Dialogue System and Machine Reading Comprehension
    CHEN Zhigang, HUA Lei, LIU Quan, YIN Kun, WEI Si, HU Guoping
    2019, 33(8): 101-110.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper proposes an automatic sentence completion method by combining dependency parsing with deep neural networks. Firstly, a sequence modeling method based on syntactic information expansion is proposed, which can preserve the efficiency while employing syntactic information. On the basis of this, we use the idea of learning to rank to train the candidate answer ranking model. Secondly, aiming at the lack of details of the overall sequence modeling, an automatic sentence completion model based on multi-state information fusion of language model is proposed. Finally, a multi-source information fusion model combining sentence representation, dependency syntax, and multi-state information is designed. This paper also constructed an English sentence completion dataset. The experimental results on this dataset show that the dependency syntax expansion model achieves an absolute improvement of 11% compared with the baseline sequence modeling methods; the language model based state ranking technique achieves an absolute improvement of 9.3% compared with the baseline model; and the final multi-source information fusion model achieved the top accuracy of 76.9% on the test set.
  • Sentiment Analysis and Social Computing
  • Sentiment Analysis and Social Computing
    LIU Haijiao, MA Huifang, CHANG Yang, LI Zhixin
    2019, 33(8): 111-120.
    Abstract ( ) PDF ( ) Knowledge map Save
    In this paper, we propose a method of target community detection based on attribute subspace with entropy weight, so as to detect community related to user preferences. Firstly, the similarity between nodes is calculated from both attributes and structures, and the center node set of the target community can be obtained via extending the sample node with its neighbors given by the user. Secondly, an attribute calculation method with entropy weights is established based on the center node set, and the attribute subspace of the target community can thus be captured. Thirdly, the edge weight of network is re-written based on the similarity between nodes under the captured attribute subspace weights. Finally, the community function is defined and further improved based on the weights of the current network. And then the target community with users' preference is detected based on the center node set, which is closely connected internally and separated from the outside communities. In addition, our method can be extended to multiple community detection tasks and outlier detection. Experimental results on artificial networks and real network datasets demonstrated the efficiency and effectiveness of the proposed algorithm.
  • Sentiment Analysis and Social Computing
    ZHONG Zhaoman, DAI Hongwei, GUAN Yan
    2019, 33(8): 121-131.
    Abstract ( ) PDF ( ) Knowledge map Save
    Event-based social networks (EBSNs) provide convenient online platforms for users to organize, attend and share social events. Focusing on recommending upcoming events for users in EBSNs, we present a hybrid social model of Event-Sponsor-User graph by incorporating events, event sponsors (event groups) and users, to best capture the entities and their complex social relations in EBSNs. Due to the fact that users' interests are motivated by a complex set of factors, we propose a model of event recommendation with multiple factors based on ESU graph, which includes social influence, event content, location and time. According to the characteristics of ESU graph, the calculation of entity importance is estimated by bidirectional random walk with restart (BD-RWR). A comprehensive performance evaluation on real-world data sets collected from DoubanEvent shows the proposed method is more effective than state-of-the-art methods.
  • NLP Application
  • NLP Application
    TANG Feng, LIANG Xun, ZHAO Xiaolei, ZHANG Xuan, CHENG Hengchao
    2019, 33(8): 132-142.
    Abstract ( ) PDF ( ) Knowledge map Save
    In the full-length knight-errant novels, the protagonists are dominated by knights and martyrs with distinct characters. The nickname can summarize the most prominent features of the characters. To recognize such nicknames, this paper proposes a method combing OOV extension recognition and screening method and syntax patterns. OOV extension recognition and screening method combines the expansion and screening of the left-neighbor strings. The syntaxs pattern are performed to identify candidate nickname substrings of the observation window using nickname indicator. This paper also defines concepts such as candidate nickname substrings and optional nickname substrings. The high frequency word list of the martial arts novels and low-frequency pointer dictionary are derived from statistics and classification,The results show that this method is feasible and effective.