2018 Volume 32 Issue 6 Published: 15 June 2018
  

  • Select all
    |
    Language Analysis and Calculation
  • Language Analysis and Calculation
    YANG Fengling, ZHOU Qiaoli, CAI Dongfeng, JI Duo
    2018, 32(6): 1-11.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper proposes a semantic role labeling method combined with the phrase structure parsing, consisting of sentence pruning, clause extraction, semantic role analyzing and sentence restoration, and themodification of the argument boundary. Pruning removes the parallel structure and parenthesis, and clause extraction has different processing methods for different forms of clauses. The modification of boundary is mainly aimed at certain semantic roles. The experiments on CoNLL2004 and CoNLL2005 corpus reveal the F-score of 85.66% and 88.25%, respectively.
  • Language Analysis and Calculation
    GUO Dongdong, SONG Jihua, PENG Weiming, ZHANG Yinbing
    2018, 32(6): 12-18.
    Abstract ( ) PDF ( ) Knowledge map Save
    There are a lot of dynamic words in the field of international Chinese teaching. The three syllable noun, as a common vocabulary in international Chinese teaching, is rich in dynamic words. This paper first introduces a knowledge representation method of three syllable noun dynamic word structure. On the basis of a tagged corpus of international Chinese Textbooks, it collects all structural modes of three syllable noun dynamic words and the corresponding frequency information, which forms a structural mode knowledge base of three syllable noun dynamic words for international Chinese teaching. Finally, the three syllable noun dynamic words are analyzed according to the structural mode knowledge base.
  • Language Analysis and Calculation
    TAN Yongmei, YANG Yixiao, YANG Lin, LIU Shuwen
    2018, 32(6): 19-27.
    Abstract ( ) PDF ( ) Knowledge map Save
    To deal with the incorrect usage of articles and prepositions in GEC (Grammatical Error Correction) area, this paper proposes a sequence labeling method. As for incorrect usage of noun form, verb form and subject-verb agreement, this paper proposes an N-gram voting strategy based on corpus collected from ESL (English as Second Language) essays and news. The results show that the method in this paper on CoNLL (2013) corpus achieves an overall F1 score of 33.87%, outperforming the top ranked UIUC‘s F1 score (31.20%), and a 38.05% F1 score for article errors and 28.89% for preposition errors, both exceeding UIUC's result (33.40% for article errors and 7-22% for preposition errors, respectively).
  • Language Analysis and Calculation
    GUAN Yong, LV Guoying, LI Ru, GUO Shaoru, TAN Hongye
    2018, 32(6): 28-35,43.
    Abstract ( ) PDF ( ) Knowledge map Save
    Discourse title selection for reading comprehension in the college entrance examination on Chinese is to select the best option by summarizing and analyzing the articles. The title usually captures the meaning of the article accurately in a distinctive structure. Summarizing information about the article and analyzing the title structure is the key to solve the problem. This paper proposes a correlation analysis model based on title and discourse key-points to solve the problem. This model constructs a correlation matrix of title and the discourse key-points, selecting the best answer is jointly with the title structure features. The experiment on the national college entrance examination questions of recent 10 years verifies the validity of the method.
  • Machine Translation
  • Machine Translation
    FAN Wenting, HOU Hongxu, WANG Hongbin, WU Jing, LI Jinting
    2018, 32(6): 36-43.
    Abstract ( ) PDF ( ) Knowledge map Save
    Neural machine translation (NMT) has become a prominent model in Mongolian-Chinese translation task. We implement neural machine translation model with priori information. On one hand,we train word representations using large-scale monolingual corpus to act as the initial word vectors. On the other hand,we add part-of-speech feature for word vector to solve the problem of grammatical ambiguity. To solve the out of vocabulary problem,we use word embedding to calculate the similarity of words,then replace the out-of-vocabulary words by the most similar words who are covered by the target vocabulary. In the task of Mongolian-Chinese machine translation,experimental results show that BLEU increased 2.68 points.
  • Machine Translation
    SU Yila, ZHAO Yaping, NIU Xianghua
    2018, 32(6): 44-51.
    Abstract ( ) PDF ( ) Knowledge map Save
    High-quality Mongolian to Chinese machine translation is of great significance to the development of IT in minority areas.To deal with the word alignment, which is a key issue in SMT,this paper proposes a Mongolian segmentation based on stems and affixes. To achieve this kind of basic unit of Mongolian Chinese word alignment, we use stems and affixes table and reverse maximum matching algorithm. The experiment results indicate that the proposed method can significantly improve the alignment quality.
  • Ethnic Language and Cross Language Information Processing
  • Ethnic Language and Cross Language Information Processing
    LI Hong, YU Long, TIAN Shengwei, Turgun Ibrahim, ZHAO Jianguo
    2018, 32(6): 52-61.
    Abstract ( ) PDF ( ) Knowledge map Save
    A deep convolutional neural networks (DCNNs) combined with long-short term memory (LSTM) is proposed to extract the emergency events in Uyghur text. The method extracts six major feature blocks that are included in emergency events and employs word embedding. Using the DCNNs to extract the high level local features of the event sentence as the input,this method captures the sequence relations in the event sentence via LSTM,and train a Softmax classifier to accomplish the task. The accuracy of the method is 80.60%,the recall 81.39%,and the F value 80.99%.
  • Ethnic Language and Cross Language Information Processing
    Lin Songkai, Mao Cunli, Yu Zhengtao, Guo Jianyi , Wang Hongbin, Zhang Jiafu
    2018, 32(6): 62-70,79.
    Abstract ( ) PDF ( ) Knowledge map Save
    In this paper, we propose a Burmese word segmentation method based on convolution neural network. Firstly, we apply the syllable structure features of Burmese to the distributed representation of the word vector feature of Burmese syllable words. Then,based on convolutional neural networks,we fuse the features of syllable and syllable's context to obtain effective feature representation. And the effective feature vectors of Burma word segmentation are automatically studied by using layer by layer feature optimization of deep network. Finally,we use softmax classifiers to predict syllable sequence markers. The experimental results show that the segmentation method proposed in this paper achieves good results.
  • Information Extraction and Text Mining
  • Information Extraction and Text Mining
    WANG Shuai, ZHAO Xiang, LI Bo, GE Bin, TANG Daquan
    2018, 32(6): 71-79.
    Abstract ( ) PDF ( ) Knowledge map Save
    With the explosive growth of information on the Internet,it becomes more important to improve the efficiency of knowledge acquisition. Automatic text summarization techniques provide a good means for fast knowledge acquisition by compressing and refining information. Existing automatic text summarization methods,when dealing with long text,exhibit poor accuracy,and fail to meet users’ need for performance. In this paper,we propose a two-phase automatic summarization method for long text,namely,TP-AS. Firstly,it employs a hybrid semantic similarity computation method based on a graph model to extract key sentences. Then,it constructs a recurrent neural network encoder-decoder model with attention and pointer mechanisms to generate summaries. Through experiments on real large-scale long-text corpora in financial area,the effectiveness of TP-AS is verified,and its accuracy for automatic summarization notably outperforms other existing methods.
  • Information Extraction and Text Mining
    ZHANG Zhiyuan, ZHAO Yue
    2018, 32(6): 80-87,97.
    Abstract ( ) PDF ( ) Knowledge map Save
    Opinion target extraction is an important task of sentiment analysis. Based on a semantic dictionary,this paper proposes seven semantic features of opinion targets in relation to their categories via the semantic similarity and relevance computation. Since there are exist syntactic dependency between the opinion targets and opinion words, this paper further presents the extraction method of sentiment syntactic dependency features,ignoring those objective words or micro sentiment words to improve the accuracy. In the experiments on three datasets of SEMEVAL,the combination of new semantic features and sentiment syntactic dependency features enable the CRFs a F1 score of 3.78 points higher than the SEMEVAL's best score for constrained systems,and 2 points higher for unconstrained systems.
  • Information Extraction and Text Mining
    FU Ruiji, WANG Dong, WANG Shijin, HU Guoping, LIU Ting
    2018, 32(6): 88-97.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper proposs the task of elegant sentence recognition in Chinese essays of high school students for Automated Essay Scoring (AES). To deal withthis task clellenging the classical text classification plus feature engineering,this paper presents a deep neural network combining Convolutional Neural Network (CNN) and Bi-directional Long Short-Term Memory (BiLSTM) networks to recognize grace sentences. The experiment results show that our joint neural network ranks to in precision (89.23%),with a comparable F1 score to BiLSTM (75.39%). We finally apply the graceful sentence features to the AES task,which can reduce the large-margin prediction error by 21.41%.
  • Information Extraction and Text Mining
    QIU Yingying, HONG Yu, ZHOU Wenxuan, YAO Jianmin, ZHU Qiaoming
    2018, 32(6): 98-106.
    Abstract ( ) PDF ( ) Knowledge map Save
    Event extraction aims at extracting event information from raw texts and representing them as a structured text. As a basic event extraction method,supervised learning often suffers from small scale,imbalanced distribution and uneven quality of training corpus. Moreover,traditional event extraction methods based on feature engineering are complicated and will always cause error propagation. To address these issues,this paper presents a method to combine deep learning and active learning by the confidence of the query function based on RNN's trigger classification,in order to improve the quality and efficiency of corpus annotation as well as the ultimate performance. The experimental results show that this joint learning method can improve the event extraction,with substantial room for further exploration.
  • Sentiment Analysis and Social Computing
  • Sentiment Analysis and Social Computing
    ZHANG Shusen, WEI Yudang, LIANG Xun, DOU Yong, XU Yuan, LIANG Tianxin
    2018, 32(6): 114-123.
    Abstract ( ) PDF ( ) Knowledge map Save
    Social network structure and user relationship are the important topicsin social network analysis. In this paper,we study the power-law distribution and the identification of kinship between users for the mobile social network. Three power law distributions are revealed in the distribution of degree,connected sub-graph scale and the user contacts,which are compared with other social networks. We study the identification model of kinship by using GBDT (Gradient Boost Decision Tree) and LR (Logistic Regression) fusion method by extracting a variety of salient features of user's call behavior. The experiment indicates that the model can determine whether there is a kinship between users at a precision of 81.01%.
  • Sentiment Analysis and Social Computing
    XU Linhong, LIN Hongfei , QI Ruihua, GUAN Jinghua
    2018, 32(6): 124-131.
    Abstract ( ) PDF ( ) Knowledge map Save
    Text Sentiment Analysis,one of the hot topics in natural language processing,is based on the analysis of lexicon. Considering Chinese characters,the constituents of lexicon,convey their meaning through sounds and logograph,this paper aims at building a taxonomy of sentiment lexicon by the comprehensive analysis of the radicals and phonemes of each character. In our model,each Chinese character,radicals and phonemes are vectorized and then integrated with the original word vector to generate new expressions of sentiment lexicon,and finally the polarities of sentiment lexicon are categorized with feedforward neural network,convolutional neural network and other approaches. Experiment results reveal that three types of vector features have effectively improved the accuracy of sentiment lexicon classification,as well as a better sentiment sentence classification. results in COAE materials.
  • NLP Application
  • NLP Application
    QIU Bing, HUANGFU Wei , ZHU Qingzhi
    2018, 32(6): 132-142.
    Abstract ( ) PDF ( ) Knowledge map Save
    Ancient Chinese is a core course in the Chinese language and literature program. However, its intended learning outcomes with the existing textbooks are hard to evaluate, as the article selection, language point analysis, and overall arrangement are mostly based on editors' subjective experience. In order to quantify the expected learning outcomes, we put forward a novel approach based on a teaching-oriented lexical corpus of pre-Qin classics, which takes the frequency, importance, and semantic evolution of words into consideration. Then a case study is carried out by comparing two representative textbooks, Ancient Chinese (by Prof. Wang Li, editor in chief) and An Ancient Chinese Reader (2nd edition, by Prof. Wang Shuo), in regard to text length, language point density, distribution of new language points, as well as the learning curve. The quantitative results support the traditional qualitative understanding of these two sets and prove that our approach is valid. In the end, the re-ordering of the articles in the textbooks is discussed and a new learning curve which better fits the principle of gradual improvement is obtained.