2018 Volume 32 Issue 1 Published: 15 January 2018
  

  • Select all
    |
    Survey
  • Survey
    LIU Lei, LIANG Maocheng
    2018, 32(1): 1-8.
    Abstract ( ) PDF ( ) Knowledge map Save
    Automatic detection and correction of grammatical errors in EFL learners-writing can help perform automatic evaluation of the essay quality and assist learners by providing written corrective feedback to facilitate autonomous learning. In this paper, we provide a survey of the application of NLP technology in automatic grammar error correction of English learners-writing in the last decade. Firstly, we introduce the large-scale native and learner English corpora publically available and three data-driven methods for the automatic grammatical correction system. Then the development of system evaluation methods is discussed. Lastly, we conclude with some suggestions for future directions.
  • Language Analysis and Calculation
  • Language Analysis and Calculation
    SU Pei, JIANG Minghu, BAI Chen
    2018, 32(1): 9-17.
    Abstract ( ) PDF ( ) Knowledge map Save
    Metaphors are general and pervasive in everyday language. It is also an important way for us to understand, and describe the world. This study focuses on the unique metaphorical form of Chinese, and examines the metaphors cognitive mechanism of the “DE” phrases of Chinese NP + NP forms. The language data is sorted and pretreated through three aspects: plausibility, familiarity and figurativeness. The results show that the metaphorical expressions have evoked a significant negative N400 component compared with the literal expressions. While both metaphorical and literal expressions are ratex similarly as familiar and interpretable, the ERP results show that the conventional metaphors required a short burst of additional processing effort when compared with literal ones. And besides “A is B” form, the NP + NP form of metaphor in Chinese can also evoke larger N400 effect. The experiment further shows that even if the source and target domain do not appear at the same time, it can still stimulate the brain-s cognitive mechanism of metaphor.
  • Language Analysis and Calculation
    YANG Yi, FENG Wenhe
    2018, 32(1): 18-25.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper focuses on the annotation and analysis of the Russian syntactic analogues of Chinese clauses based on the Chinese-Russian parallel official documents. Firstly, the Chinese (source) texts are segmented into clauses, and the Russian (target) texts are segmented into corresponding Russian syntactic analogues of Chinese clauses. Then, we set the general principles and annotation system to annotate these Russian syntactic analogues. Finally, we examine the Russian data and reveal the following results: 1) most are the components of sentences (74.85%) rather than the sentences (25.15%). 2) Single predicative center accounted for the largest proportion (69.04%), followed by the non-predicative center (27.63%) and the multi-predicative center (3.33%). 3) The simple predicate accounted for the vast majority of the single predicative center (31.84%), the verb phrases accounted for the vast majority of the non-predicative center (51.26%), and the subordinate sentences accounted for the vast majority of the multi-predicative center (47.92%).
  • Language Analysis and Calculation
    LIU Hongchao, HUANG Churen, HOU Renkui, LI Hongzheng
    2018, 32(1): 26-33.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper investigates the prediction of event types of Mandarin verbs, which are trisected into state, activity and transition or quartered into state, activity, accomplishment and achievement. Previous linguistic studies of event types of Mandarin verbs have come up with various features for different event types, but none of them are validated by statistical or computational methods. Both supervised vectors and unsupervised vectors are examined for prediction, i.e. the linguistics features and the embedding vectors by word2vec, respectively. We achieve an overall accuracy of 73.6% using classifiers of multinominal regression, supporting vector machine and the neural network.
  • Language Analysis and Calculation
    QU Jianju, FENG Minxuan
    2018, 32(1): 34-42.
    Abstract ( ) PDF ( ) Knowledge map Save
    Based on word-formation knowledge in knowledge base, this paper uses the phased algorithm to automatically predict word-formation knowledge of unknown words. Through the combination of morpheme meaning or the combination of morpheme-s semantic category, this method first predicts the knowledge of semantic level, then determines the corresponding morphemes, finally gets the knowledge of word-formation of unknown words. The algorithm is simple, intuitive and reasonable. Under the criteria that the first morpheme-s parts of speech, the first morpheme-s semantic category, the first morpheme-s meaning, the last morpheme-s parts of speech, the last morpheme-s semantic category, the last morpheme-s meaning and the grammatical structure type are all correct, the experimental results show that the prediction accuracy is 62.32% and the recall rate is 61.72%.
  • Language Analysis and Calculation
    JIANG Feng, CHU Xiaomin, XU Sheng, LI Peifeng, ZHU Qiaoming
    2018, 32(1): 43-50.
    Abstract ( ) PDF ( ) Knowledge map Save
    Discourse analysis is an important task in the field of natural language processing. The analysis of primary and secondary relations at discourse-level helps to understand the discourse structure and semantics. Based on the research of micro discourse-level primary and secondary relation recognition, this paper aims at macro discourse-level primary and secondary relation and provides a recognition model based on topic similarity with word2vec and LDA. The topic similarity based on word2vce and the topic similarity based on LDA calculate the semantic similarity on different dimensions. They are complementary at the semantic level, which enhances the ability of the model to recognize the macro discourse-level primary and secondary relations. Experimental results on the Macro Chinese Discourse TreeBank (MCDTB) show that our model achieves 79.9% in F1-score, and 81.82% in accuracy, which improves the baseline by 1.7% and 1.81%, respectively.
  • Language Analysis and Calculation
    LU Zhenhuan, KONG Fang, ZHOU Guodong
    2018, 32(1): 51-58.
    Abstract ( ) PDF ( ) Knowledge map Save
    Event co-reference resolution has obvious impact on many other NLP applications, i.e., discourse analysis, information extraction, and so on. A complete framework based on CNNs is proposed for event coreference resolution. Three issues are addressed. First, filtering strategies from the perspective of semantic compatibility and temporal consistency are employed to reduce the distribution imbalance. Second, the combination representation of minimum event self-description and additional relationship description between events is applied for different event annotation schemas, i.e., multiple corpora. Finally, a global inference post-processing is designed to optimize the local optimal solution generated by event-pair model. Experiments on KPB2015 and ACE2005 corpora show the effectiveness of our proposed approach.
  • Language Analysis and Calculation
    YAO Dengfeng, JIANG Minghu, Abudoukelimu Abulizi, LI Hanjing, Halidanmu Abudukelimu
    2018, 32(1): 59-67.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper tries to simulate the process of sign processing in the human brain, and designs a hybrid neural network model to solve the sign language understanding based on phonological model, i.e. converting the phonological information of hand to Chinese text. We first integrate the advantages of the two perspectives of simultaneity and sequence in sign language, and propose an improved model of sign language phonology. The first-perception first-comprehension algorithm is designed based on the cognitive mechanism of the brain, which processes Chinese text directly from phonological features of the sign that can act as linguistic features. Compared with the traditional method that deduces Chinese text from graphic features, this algorithm represents tremendous progress in cognitive computing. Experimental results verify the feasibility of the intelligent cognitive technology, which lays a technical foundation to realize robot intelligence.
  • Language Resources Construction
  • Language Resources Construction
    WANG Enxu, YUAN Yulin
    2018, 32(1): 68-74,95.
    Abstract ( ) PDF ( ) Knowledge map Save
    Machine understanding of words is mainly based on dictionaries, but the present dictionary interpretation is inaccurate and imperfect. This paper investigates this issue by analyzing the semantic structure words of and constructing interpretation templates of the words. By analyzing the words semantic structure, we try to discover what semantic components and semantic relations words contain, and determine which of them are necessary and which of them are unnecessary. Then, with examples, this paper discusses the process, principles and methods of constructing interpretation templates. Finally, this paper shows that constructing the interpretation template is helpful to solve the following issues: the polysemous words interpretation, the synonym words interpretation, the new words interpretation problem, etc.
  • Language Resources Construction
    WU Yongpeng, LI Sujian, QIN Mukun, YANG An, WANG Houfeng
    2018, 32(1): 75-82.
    Abstract ( ) PDF ( ) Knowledge map Save
    Based on the idea of Discourse Dependency Relations, a small-scale Chinese and English Discourse Dependency Treebank is constructed in this paper. Difficulties faced during the annotation process, such as multi-nucleus relation problem, selection of relations, annotation of long and complicated discourses, loss of information in hierarchical structure are discussed with solutions provided. We conduct basic statistical analysis on the Discourse Dependency Treebank and explore the similarities and differences between Chinese and English Discourse.
  • Other Language in/around China
  • Other Language in/around China
    LIANG Jinlian, Gulila Altenbek
    2018, 32(1): 83-88.
    Abstract ( ) PDF ( ) Knowledge map Save
    A coarse-to-fine strategy is applied for the two-stage syntactic analysis of the Kazakh phrase structure. The first stage generates 20-best parses with a rough parser. The second stage employs the perceptron method to re-rank them for the best result with the extracted features. This method can not only obtain the sentence structural through the two stages, but also provide the detailed feature information for better analysis of the result. Experiments indicate an accuracy 71.4% of this parser.
  • Other Language in/around China
    FAN Daoerji, GAO Guanglai, WU Huijuan
    2018, 32(1): 89-95.
    Abstract ( ) PDF ( ) Knowledge map Save
    A public well-recognized Mongolian offline handwritten database is the basis for the research and development of Mongolian handwriting recognition system. Based on the research on Mongolian coding, word formation and grammar, a large-vocabulary Mongolian offline handwritten database (MHW) is constructed, which contains 100000 pieces of Mongolian words, i.e. 20 samples for each of 5000 words. The test set I contains 5000 samples and test set II contains 14085 samples. An automatic error detection algorithm is applied, which is based on the variable length of each Mongolian word. The performance of MHW is validated on three propular handwriting recognition models, among which the Recurrent Neural Network based model shows best performance of 2.20% on test set I and 5.55% on test set II with constrained dictionary.
  • Information Extraction and Text Mining
  • Information Extraction and Text Mining
    CAI Qiang , HAO Jiayun , CAO Jian , LI Haisheng
    2018, 32(1): 96-101.
    Abstract ( ) PDF ( ) Knowledge map Save
    To best exploit the local and global feature, we proposed a distant supervised relation extraction model based on multi-level attention mechanism. We employ an attention matrix in pooling layer to capture the word-level sematic feature which indicates the relevant relationship between input words and relations. Moreover, we adopted sentence-level attention mechanism to compare the relationship between sentences and predicted relations. Experimental results show that the mean accuracy of the proposed model achieves 78% in the NYT data set, indicating an effective use of multi-level feature and better performance of distant relation extraction task.
  • Information Extraction and Text Mining
    WEN Wen, WU Sijie, CAI Ruichu, HAO Zhifeng
    2018, 32(1): 102-115.
    Abstract ( ) PDF ( ) Knowledge map Save
    Knowledge-entity type labeling is important for the structural management of literature data. However, since the knowledge entities are highly specialized and have diversified types, traditional entity-extraction and labeling methods do not produce good results on the literature data. To solve this problem, we summarize several characteristics of knowledge-entity by exploring the literature data. And then according to these characteristics, we propose a combination of unsupervised and semi-supervised method, which is based on some heuristic rules and multi-label weighted LPA propagation. This method is able to extract candidate labels from the data and does the knowledge-entity labeling work without manual annotation. Experimental results demonstrate that the proposed method is flexible, and more suitable for the literature data.
  • Information Extraction and Text Mining
    LI Lishuang, GUO Yuankai
    2018, 32(1): 116-122.
    Abstract ( ) PDF ( ) Knowledge map Save
    Named entity recognition (NER) is one important step in natural language processing (NLP). In recent years, end-to-end neural network models for named entity recognition have shown better performances on general domain datasets (e.g. news), without additional hand-crafted features. However, in the biomedical domain, recent studies indicate that hand-designed features have great impact on the model-s performance. In this paper, we propose a novel end-to-end neural network model: CNN-BLSTM-CRF, which does not rely on the hand-designed features and domain knowledge. CNN (convolutional neural network) extracts the character vectors with shape features from each word, which are concatenated with the word embeddings and fed to the BLSTM-CRF network. We evaluate our approach by comparing against existing neural network models for NER using Biocreative II GM dataset and JNLPBA2004 dataset. The results show that our system reaches 89.09% and 74.40% in F-scores, respectively, and outperforms other state-of-the-art of methods.
  • Information Extraction and Text Mining
    LUAN Kexin, DU Xinkai, SUN Chengjie, LIU Bingquan, WANG Xiaolong
    2018, 32(1): 123-130.
    Abstract ( ) PDF ( ) Knowledge map Save
    Sentence ordering is a key technology for multiple document summarization and answer fusion, yielding a direct effect on the readability of the output text. To capture the inherent semantic logic relation, this paper proposes a sentence ordering model with the attention mechanism. Experimental results show that the sentence ordering model with the attention mechanism is superior to the baseline method in the sentence ordering task.
  • Sentiment Analysis and Social Computing
  • Sentiment Analysis and Social Computing
    LI Zekui, LI Xueting, ZHAO Yanyan
    2018, 32(1): 131-138.
    Abstract ( ) PDF ( ) Knowledge map Save
    As an emerging social media platform, more and more netizens tend to obtain and share information by microblog. To deal with tens of millions level microblogging data per day, the analysis of users-attitudes on an event is a meaningful task. This paper reveals that there are different emotions distributions when the public talk about different topics of an event. In response to that phenomenon, we propose a combination method of an unsupervised learning method based on hierarchical clustering and a kind of semi-supervised learning algorithm used for topic rectification, so that we can mine the topics under events as well as their micro-blogs. Then we analyze the emotion distribution using some related algorithms about sentiment analysis. Experiment results show that the proposed method can accurately analyze the causations for the emotional event distribution.
  • Sentiment Analysis and Social Computing
    YIN Hao, LI Shoushan, GONG Zhengxian, ZHOU Guodong
    2018, 32(1): 139-145.
    Abstract ( ) PDF ( ) Knowledge map Save
    Emotion classification is an important research issue in natural language processing. It aims at classifying the type of emotion automatically. However, most existing studies assume that the labeled data of each emotional category is balanced, which may not be true in practice. Aimed at the emotion classification on imbalanced data, this paper proposes a novel approach called multi-channel LSTM which combines several LSTM models. First, the approach gets some groups of balanced training corpus by under-sampling method. Then, it trains several LSTM models with each training corpus. Third, it yields the results through joining all the LSTM models. Evaluation on SINA micro-blog shows that the proposed approach performs better than several existing imbalanced classification methods.