2019 Volume 33 Issue 2 Published: 25 February 2019
  

  • Select all
    |
    Language Analysis and Calculation
  • Language Analysis and Calculation
    JIA Honghao, LUO Zhiyong
    2019, 33(2): 1-7.
    Abstract ( ) PDF ( ) Knowledge map Save
    The automatic recognition of inter-sentence quotation relationship is a valid issue in discourse analysis. The quotation relationship between sentences influences the analysis of sentence groups. At present, there are few studies on the relationship between quotations in natural language processing. This paper attempted to make a preliminary exploration of the relationship between quoted sentences and studied the identification of quotation with conditional random fields(CRF) and Bidirectional Long Short-Term Memory network Enhanced CRF (BLSTM-CRF). It introduces the governors in the leading sentence into the model. The experimental results show that CRF model performs better with 85.49% in precision, and BLSTM outperforms with 79.60% in F-value.
  • Language Analysis and Calculation
    ZHANG Kunli, HAN Yingjie, JIA Yuxiang, MU Lingling, SUI Zhifang, ZAN Hongying
    2019, 33(2): 8-16.
    Abstract ( ) PDF ( ) Knowledge map Save
    Logical complement semantics is referred to as the meaning expressed by elements of negation, degree, tense and aspect, modality and mood that are attached to a basic predicate-centered proposition in a sentence. It is embodied as the semantic constraint relation between logical semantic operators and the predicate. Logical complement semantics as an effective supplement of semantic relations expressed by elements in a basic proposition is important for deep understanding of sentence semantics. This paper proposes a Chinese logical complement semantic annotation framework for deep semantic understanding. Specifically, classification systems and operator dictionaries are constructed for representing negation, degree, tense and aspect, and mood based on existing research results. Annotation rules are established to annotate logic complement semantics for the sentences which have been tagged for basic propositional arguments. Finally, the statistics of annotation results is presented, and the problems in annotation process are also analyzed.
  • Language Analysis and Calculation
    JIN Tianhua, JIANG Shan, YU Dong, ZHAO Meiqian, LIU Lu
    2019, 33(2): 17-25.
    Abstract ( ) PDF ( ) Knowledge map Save
    Textual entailment(RTE) is a challenging issue for natural language processing. This paper proposes to categorize the textual entailment into three tyes: lexical entailment, chunked-based heterogeneous entailment and common-sense entailment. Focused on the concept of chunked-based heterogeneous, we further present a chunk annotation standard and a labeled dataset. Then we explore the rule-based model and the deep learning model respectively for the automatic detection of the chunk entailments. The experimental results show that the deep learning model adopted in this paper can discover the entailment fragments effectively.
  • Language Analysis and Calculation
    FENG Wenhe, GUO Haifang, YANG Hua
    2019, 33(2): 26-35.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper identifies and analyzes the English translation of Chinese De-construction which expresses the conditional relation in the legal texts. Quantitative investigations on the English translation of Chinese De-construction in the legal texts of General Principles of the Civil Law shows that: 1. There are more adverbial clauses than attributive clauses (85.40%>14.60%). 2. Finite forms appear more frequently than non-finite forms (87.59%>12.41%). And “Present time” accounts for the absolute majority (99.17%) in the finite forms, preposition phrases occupies the majority in the non-finite forms (64.71%). 3. “If” ranks top among the adverbial introduction wordop (86.32%), and “who” among the attributive introduction word (60.00%). This paper suggests that the De-construction in Chinese legal texts is a clause rather than a phrase, and the word “De” is a discourse connective indicating a conditional relation.
  • Language Resources Construction
  • Language Resources Construction
    GUO Lijuan, PENG Xue, LI Zhenghua, ZHANG Min
    2019, 33(2): 34-42.
    Abstract ( ) PDF ( ) Knowledge map Save
    The existing Chinese dependency treebanks are mainly annotated for canonical texts, and give little consideration to web texts, such as blogs, WeiBo, and WeChat. This paper presents a large-scale tree-bank annotation, based on the recently designed annotation guideline and online annotating system. Altogether 15 part-time annotators are involved and a strict annotation procedure is applied to guarantee the quality. So far, we have annotated about 30,000 Chinese sentences with their dependency syntax trees, including about 10,000 sentences from Taobao headline texts. This paper describes the details in data selection and annotation workflow. We also analyze the annotation accuracy, inter-annotator consistency, and distribution of annotated data.
  • Language Resources Construction
    WU Ruizhu, LI Hanjing, LV Huihua, YAO Dengfeng
    2019, 33(2): 43-50.
    Abstract ( ) PDF ( ) Knowledge map Save
    The parallel corpus of Chinese and sign language construction is of significance in machine translation and contrastive language studies. The copus presented in this paper consists of the video of the sign language, information of the collectors and annotators, as well as 14 layers of labeling information via the multimedia labeling software ELAN (either manual or non-manual information). The cosine similarity based on VSM is adopted to reduce corpus deduplication. It is also used to test the similarity of the expert to ensure the quality of the corpus.
  • Knowledge Representation and Acquisition
  • Knowledge Representation and Acquisition
    PENG Min, YAO Yalan, XIE Qianqian, GAO Wang
    2019, 33(2): 51-58.
    Abstract ( ) PDF ( ) Knowledge map Save
    Knowledge representation learning has attracted much attention in natural language processing with encouraging results especially on tasks such as Entity Linking, Relationship Extraction, Question Answering and so on. However, most of the existing models only use the structural information of knowledge graph and cannot handle new entities or entities with few facts very well. This paper proposes a joint knowledge representation model which utilizes both entity description and structural information. Firstly, we introduce convolutional neural network models to encode the entity description. Then, we design the attention mechanism to select the valid information of the text. Moreover, we introduce the position vector as the supplementary information. Finally, a gating mechanism is applied to integrate the structural and textual information into the joint representation. Experimental results show that our models outperform other baselines on link prediction and triplet classification tasks.
  • Ethnic Language Processing and Cross Language Processing
  • Ethnic Language Processing and Cross Language Processing
    LONG Congjun, LIU Huidan, ZHOU Maoke
    2019, 33(2): 59-66.
    Abstract ( ) PDF ( ) Knowledge map Save
    The longest noun phrases carry abundant syntactic and semantic information, corresponding to a syntactic components for most cased. By comparing the essence of the different longest noun phrases, this paper defines the longest noun phrase in Tibetan language from the perspective of syntactic tree. Total of 6 038 sentences are extracted from a Tibetan treebank, and the structure type, boundary feature and frequency of longest noun phrases are analyzed. Two approaches, the sequence annotation model and the parsing algorithm, are investigated to detect the longest noun phrases in Tibetan. Experiments proves the better performance of the sequence labeling approach, yielding 87.14% precision, 84.72% recall and 85.92% F-value respectively.
  • Ethnic Language Processing and Cross Language Processing
    BAN Mabao, CAI Zhijie, LAMA Zhaxi
    2019, 33(2): 67-74.
    Abstract ( ) PDF ( ) Knowledge map Save
    The syntax analysis of Tibetan interrogative sentences has broad application prospects such as in Tibetan question answering system, search engine, information extraction and retrieval. By analyzing the features of Tibetan interrogative sentences, this paper classified the Tibetan interrogative sentences and summarized the structural features of various Tibetan interrogative sentences. The PCFG method is utilized to parse the Tibetan interrogative sentences. The experiment reveals 96.0%, 95.4% and 95.7% in accuracy, recall and F value, respectively.
  • Ethnic Language Processing and Cross Language Processing
    DOU Gecao, CAI Rangzhuoma, NAN Cuoji, SUAN Taiben
    2019, 33(2): 75-80.
    Abstract ( ) PDF ( ) Knowledge map Save
    Speech synthesis is one of the core technologies of human-computer interaction. With the development of neural network, the speech synthesis technology based on neural network has attracted more and more attention. After analyzing the structure and spelling rules of Tibetan characters, this paper studies Tibetan speech synthesis by combining Sequence to Sequence model and attention mechanism. The experimental results show that this method has good performance in the speech synthesis of Tibetan.
  • Ethnic Language Processing and Cross Language Processing
    ZHANG Jing, XU Shuang, HE Jianjun, LI Min, ZHENG Ruirui
    2019, 33(2): 81-88.
    Abstract ( ) PDF ( ) Knowledge map Save
    An important step in the Manchu document analysis is segmentation and extraction Manchu words from large images of Manchu documents. The paper proposes a new Manchu word segmentation and extraction method based on seam craving. First of all, this paper detects the number of text lines by projection profile matching method, then paints them. Secondly, the minimum energy line is located by dynamic planning from bottom to top between adjacent text lines, and the best segmentation lines that don‘t cut through Manchu word components are determined by restraining the midline areas. Finally the independent Manchu text column and Manchu word is extracted according to the segmentation curve. Experimental results show that this method achieved better segmentation and extraction result on Manchu document image datasets.
  • Information Extraction and Text Mining
  • Information Extraction and Text Mining
    LI Lishuang, QIAN Shuang, ZHOU Anqiao, LIU Yang, GUO Yuankai
    2019, 33(2): 89-96.
    Abstract ( ) PDF ( ) Knowledge map Save
    Drug-Drug Interaction (DDI) extraction is an important issue in biomedical relationship extraction. Most of existing methods emphasize the key information such as entities and positions in the sentences. To further exploit the sentence structure, this paper proposes a Drug-Drug interaction extraction model based on the attention mechanism over the dependency. The correlation between the shortest dependency path and the sentence is measured to capture the useful information. Firstly, this model uses BiGRU network to learn the semantic information and context information of the original sentence and the Shortest Dependency Path (SDP) respectively. Secondly, the SDP information is incorporated into the original sentence information through the Attention mechanism. Finally, the final sentence representation is used to classify and predict DDI. This approach is evaluated on DDIExtraction 2013 corpus, yielding a micro F-scores of 73.72%.
  • Sentiment Analysis and Social Computing
  • Sentiment Analysis and Social Computing
    WANG Yan, TANG Jie
    2019, 33(2): 97-104.
    Abstract ( ) PDF ( ) Knowledge map Save
    The network representation learning algorithm is a popular issue in social network analysis, and this paper is to verify the existing network representation learning algorithms by network data with different structures. To evaluate the effect, the efficiency and the application limits of various algorithms, we choose the multi-label classification task of network nodes to compare ten algorithms of three categories on eight data sets. The experimental results show that Deep Learning algorithms like DeepWalk have stable and good performance on various types of networks, and the application of algorithms based on matrix factorization are limited by their high space complexity.
  • Sentiment Analysis and Social Computing
    GUAN Pengfei, LI Bao‘an, LV Xueqiang, ZHOU Jianshe
    2019, 33(2): 105-111.
    Abstract ( ) PDF ( ) Knowledge map Save
    To deal with sentiment analysis at the sentence level, this paper proposes a method of attention enhanced Bi-directional LSTM. It employs attention mechanism to learn every word weight distribution of sentiment tendency directly from the word vector. Tested on the NLPCC 2014 sentiment analysis dataset, the results of the model outperfroms the other sentence level sentiment classification model.
  • Sentiment Analysis and Social Computing
    YU Shengwei, LU Qi, CHEN Wenliang
    2019, 33(2): 112-121.
    Abstract ( ) PDF ( ) Knowledge map Save
    Fine-grained opinion mining aims at detecting sentiment units and determining sentiment polarity from opinion text. Recent methods are mostly based on sequence labeling models, rarely using the information of sentiment lexicon resources. This paper proposes a fine-grained opinion mining method based on feature representation of domain sentiment lexicon. It generates feature representation by using domain sentiment lexicon, applying it as the input of sequence labeling model. We build a new sentiment lexicon in E-commerce domain, and then we design feature representation of domain sentiment lexicon for CRF and BiLSTM-CRF. Experiments on E-commerce reviews show that our proposed method performs well on both models and outperforms the method based on other lexica.
  • Sentiment Analysis and Social Computing
    YU Hualei, RAO Yuan, TANG Caifang, REN Haoran
    2019, 33(2): 122-130.
    Abstract ( ) PDF ( ) Knowledge map Save
    The shareholder profile provide a new way of quick understanding of the real preference characteristics behind the shareholders‘ market behaviors, which is of significance in the investment decisions of external investors. The construction of shareholder portrait is especially meaningful considering the abnormal fluctuation of Chinese stock price caused by the frequent market behavior of the top ten circulating shareholders, in which they can always grasps the opportunity of trading perfectly. This paper analyzes the investment behavior of the ten circulating shareholders, constructs the shareholder portrait from two aspects: the activeness degree and preference characteristic. Moreover, the shareholders are further classified as individual, organization and fund. The completed portrait is designed to cover all aspects of the 3 kinds of shareholder. In addition, some methods of shareholder labeling are put forward, and some issues are discussed with solutions in dealing with shareholder‘s characteristics.
  • NLP Application
  • NLP Application
    TU Mengchun, LIU Ying
    2019, 33(2): 131-142.
    Abstract ( ) PDF ( ) Knowledge map Save
    This article uses long novels of Yu Hua and Mo Yan, five for each, as the corpus. The lengths of the paragraphs, sentences, clauses, color words, punctuation, part of speech and words, together with the n-grams are selected as the features. Statistically, clustering and k-s test are applied to judge the overall similarity of the two authors, and the Wilcoxon test is adopted to validate the difference between a specific characteristic between the two authors. After a detailed analysis, it is revealed that Mo Yan employs a larger vocabulary, showing strong emotions, ancient expressions and regionalisms, while Yu Hua assumes a calm and satirical style.