Journal of Chinese Information Processing

Select

Language Analysis and Calculation

Semantic Role Labeling Combined with Phrase Structure Prasing

YANG Fengling, ZHOU Qiaoli, CAI Dongfeng, JI Duo

2018, 32(6): 1-11.

Abstract ( ) PDF ( )

Knowledge map

Save

This paper proposes a semantic role labeling method combined with the phrase structure parsing, consisting of sentence pruning, clause extraction, semantic role analyzing and sentence restoration, and themodification of the argument boundary. Pruning removes the parallel structure and parenthesis, and clause extraction has different processing methods for different forms of clauses. The modification of boundary is mainly aimed at certain semantic roles. The experiments on CoNLL2004 and CoNLL2005 corpus reveal the F-score of 85.66% and 88.25%, respectively.

Select

Language Analysis and Calculation

Analysis of Three Syllable Noun Dynamic Words Based on the Corpus of International Chinese Textbooks

GUO Dongdong, SONG Jihua, PENG Weiming, ZHANG Yinbing

2018, 32(6): 12-18.

Abstract ( ) PDF ( )

Knowledge map

Save

There are a lot of dynamic words in the field of international Chinese teaching. The three syllable noun, as a common vocabulary in international Chinese teaching, is rich in dynamic words. This paper first introduces a knowledge representation method of three syllable noun dynamic word structure. On the basis of a tagged corpus of international Chinese Textbooks, it collects all structural modes of three syllable noun dynamic words and the corresponding frequency information, which forms a structural mode knowledge base of three syllable noun dynamic words for international Chinese teaching. Finally, the three syllable noun dynamic words are analyzed according to the structural mode knowledge base.

Select

Language Analysis and Calculation

Grammatical Error Correction Using LSTM and N-gram

TAN Yongmei, YANG Yixiao, YANG Lin, LIU Shuwen

2018, 32(6): 19-27.

Abstract ( ) PDF ( )

Knowledge map

Save

To deal with the incorrect usage of articles and prepositions in GEC (Grammatical Error Correction) area, this paper proposes a sequence labeling method. As for incorrect usage of noun form, verb form and subject-verb agreement, this paper proposes an N-gram voting strategy based on corpus collected from ESL (English as Second Language) essays and news. The results show that the method in this paper on CoNLL (2013) corpus achieves an overall F1 score of 33.87%, outperforming the top ranked UIUC‘s F1 score (31.20%), and a 38.05% F1 score for article errors and 28.89% for preposition errors, both exceeding UIUC's result (33.40% for article errors and 7-22% for preposition errors, respectively).

Select

Language Analysis and Calculation

Discourse Title Selection for Chinese Reading Comprehension of College Entrance Examination

GUAN Yong, LV Guoying, LI Ru, GUO Shaoru, TAN Hongye

2018, 32(6): 28-35,43.

Abstract ( ) PDF ( )

Knowledge map

Save

Discourse title selection for reading comprehension in the college entrance examination on Chinese is to select the best option by summarizing and analyzing the articles. The title usually captures the meaning of the article accurately in a distinctive structure. Summarizing information about the article and analyzing the title structure is the key to solve the problem. This paper proposes a correlation analysis model based on title and discourse key-points to solve the problem. This model constructs a correlation matrix of title and the discourse key-points, selecting the best answer is jointly with the title structure features. The experiment on the national college entrance examination questions of recent 10 years verifies the validity of the method.

Select

Machine Translation

Mongolian-Chinese Neural Machine Translation with Priori Information

FAN Wenting, HOU Hongxu, WANG Hongbin, WU Jing, LI Jinting

2018, 32(6): 36-43.

Abstract ( ) PDF ( )

Knowledge map

Save

Neural machine translation (NMT) has become a prominent model in Mongolian-Chinese translation task. We implement neural machine translation model with priori information. On one hand,we train word representations using large-scale monolingual corpus to act as the initial word vectors. On the other hand,we add part-of-speech feature for word vector to solve the problem of grammatical ambiguity. To solve the out of vocabulary problem,we use word embedding to calculate the similarity of words,then replace the out-of-vocabulary words by the most similar words who are covered by the target vocabulary. In the task of Mongolian-Chinese machine translation,experimental results show that BLEU increased 2.68 points.

Select

Machine Translation

Research on Word Alignment in Mongolian-Chinese Statistical Machine Translation

SU Yila, ZHAO Yaping, NIU Xianghua

2018, 32(6): 44-51.

Abstract ( ) PDF ( )

Knowledge map

Save

High-quality Mongolian to Chinese machine translation is of great significance to the development of IT in minority areas.To deal with the word alignment, which is a key issue in SMT,this paper proposes a Mongolian segmentation based on stems and affixes. To achieve this kind of basic unit of Mongolian Chinese word alignment, we use stems and affixes table and reverse maximum matching algorithm. The experiment results indicate that the proposed method can significantly improve the alignment quality.

Select

Ethnic Language and Cross Language Information Processing

Uyghur Emergency Event Extracton Based on DCNNs-LSTM Model

LI Hong, YU Long, TIAN Shengwei, Turgun Ibrahim, ZHAO Jianguo

2018, 32(6): 52-61.

Abstract ( ) PDF ( )

Knowledge map

Save

A deep convolutional neural networks (DCNNs) combined with long-short term memory (LSTM) is proposed to extract the emergency events in Uyghur text. The method extracts six major feature blocks that are included in emergency events and employs word embedding. Using the DCNNs to extract the high level local features of the event sentence as the input,this method captures the sequence relations in the event sentence via LSTM,and train a Softmax classifier to accomplish the task. The accuracy of the method is 80.60%,the recall 81.39%,and the F value 80.99%.

Select

Ethnic Language and Cross Language Information Processing

A Method of Myanmar Word Segmentation Based on Convolution Neural Network

Lin Songkai, Mao Cunli, Yu Zhengtao, Guo Jianyi , Wang Hongbin, Zhang Jiafu

2018, 32(6): 62-70,79.

Abstract ( ) PDF ( )

Knowledge map

Save

In this paper, we propose a Burmese word segmentation method based on convolution neural network. Firstly, we apply the syllable structure features of Burmese to the distributed representation of the word vector feature of Burmese syllable words. Then,based on convolutional neural networks,we fuse the features of syllable and syllable's context to obtain effective feature representation. And the effective feature vectors of Burma word segmentation are automatically studied by using layer by layer feature optimization of deep network. Finally,we use softmax classifiers to predict syllable sequence markers. The experimental results show that the segmentation method proposed in this paper achieves good results.

Select

Information Extraction and Text Mining

TP-AS: A Two-phase Approach to Long Text Automatic Summarization

WANG Shuai, ZHAO Xiang, LI Bo, GE Bin, TANG Daquan

2018, 32(6): 71-79.

Abstract ( ) PDF ( )

Knowledge map

Save

With the explosive growth of information on the Internet,it becomes more important to improve the efficiency of knowledge acquisition. Automatic text summarization techniques provide a good means for fast knowledge acquisition by compressing and refining information. Existing automatic text summarization methods,when dealing with long text,exhibit poor accuracy,and fail to meet users’ need for performance. In this paper,we propose a two-phase automatic summarization method for long text,namely,TP-AS. Firstly,it employs a hybrid semantic similarity computation method based on a graph model to extract key sentences. Then,it constructs a recurrent neural network encoder-decoder model with attention and pointer mechanisms to generate summaries. Through experiments on real large-scale long-text corpora in financial area,the effectiveness of TP-AS is verified,and its accuracy for automatic summarization notably outperforms other existing methods.

Select

Information Extraction and Text Mining

Opinion Target Extraction Based on Semantic and Syntactic Dependency

ZHANG Zhiyuan, ZHAO Yue

2018, 32(6): 80-87,97.

Abstract ( ) PDF ( )

Knowledge map

Save

Opinion target extraction is an important task of sentiment analysis. Based on a semantic dictionary,this paper proposes seven semantic features of opinion targets in relation to their categories via the semantic similarity and relevance computation. Since there are exist syntactic dependency between the opinion targets and opinion words, this paper further presents the extraction method of sentiment syntactic dependency features,ignoring those objective words or micro sentiment words to improve the accuracy. In the experiments on three datasets of SEMEVAL,the combination of new semantic features and sentiment syntactic dependency features enable the CRFs a F1 score of 3.78 points higher than the SEMEVAL's best score for constrained systems,and 2 points higher for unconstrained systems.

Select

Information Extraction and Text Mining

Elegart Sentence Recognition for Automated Essay Scoring

FU Ruiji, WANG Dong, WANG Shijin, HU Guoping, LIU Ting

2018, 32(6): 88-97.

Abstract ( ) PDF ( )

Knowledge map

Save

This paper proposs the task of elegant sentence recognition in Chinese essays of high school students for Automated Essay Scoring (AES). To deal withthis task clellenging the classical text classification plus feature engineering,this paper presents a deep neural network combining Convolutional Neural Network (CNN) and Bi-directional Long Short-Term Memory (BiLSTM) networks to recognize grace sentences. The experiment results show that our joint neural network ranks to in precision (89.23%),with a comparable F1 score to BiLSTM (75.39%). We finally apply the graceful sentence features to the AES task,which can reduce the large-margin prediction error by 21.41%.

Select

Information Extraction and Text Mining

Combining Deep Learning and Active Learning for Event Extraction

QIU Yingying, HONG Yu, ZHOU Wenxuan, YAO Jianmin, ZHU Qiaoming

2018, 32(6): 98-106.

Abstract ( ) PDF ( )

Knowledge map

Save

Event extraction aims at extracting event information from raw texts and representing them as a structured text. As a basic event extraction method,supervised learning often suffers from small scale,imbalanced distribution and uneven quality of training corpus. Moreover,traditional event extraction methods based on feature engineering are complicated and will always cause error propagation. To address these issues,this paper presents a method to combine deep learning and active learning by the confidence of the query function based on RNN's trigger classification,in order to improve the quality and efficiency of corpus annotation as well as the ultimate performance. The experimental results show that this joint learning method can improve the event extraction,with substantial room for further exploration.

Select

Sentiment Analysis and Social Computing

Research on Power-law Distribution and Identification of Kinship in Mobile Social Network

ZHANG Shusen, WEI Yudang, LIANG Xun, DOU Yong, XU Yuan, LIANG Tianxin

2018, 32(6): 114-123.

Abstract ( ) PDF ( )

Knowledge map

Save

Social network structure and user relationship are the important topicsin social network analysis. In this paper,we study the power-law distribution and the identification of kinship between users for the mobile social network. Three power law distributions are revealed in the distribution of degree,connected sub-graph scale and the user contacts,which are compared with other social networks. We study the identification model of kinship by using GBDT (Gradient Boost Decision Tree) and LR (Logistic Regression) fusion method by extracting a variety of salient features of user's call behavior. The experiment indicates that the model can determine whether there is a kinship between users at a precision of 81.01%.

Select

Sentiment Analysis and Social Computing

Sentiment Lexicon Embedding Based on Radical and Phoneme

XU Linhong, LIN Hongfei , QI Ruihua, GUAN Jinghua

2018, 32(6): 124-131.

Abstract ( ) PDF ( )

Knowledge map

Save

Text Sentiment Analysis,one of the hot topics in natural language processing,is based on the analysis of lexicon. Considering Chinese characters,the constituents of lexicon,convey their meaning through sounds and logograph,this paper aims at building a taxonomy of sentiment lexicon by the comprehensive analysis of the radicals and phonemes of each character. In our model,each Chinese character,radicals and phonemes are vectorized and then integrated with the original word vector to generate new expressions of sentiment lexicon,and finally the polarities of sentiment lexicon are categorized with feedforward neural network,convolutional neural network and other approaches. Experiment results reveal that three types of vector features have effectively improved the accuracy of sentiment lexicon classification,as well as a better sentiment sentence classification. results in COAE materials.

Select

NLP Application

A Corpus-based Evaluation for Intended Learning Outcomes of Ancient Chinese Textbooks

QIU Bing, HUANGFU Wei , ZHU Qingzhi

2018, 32(6): 132-142.

Abstract ( ) PDF ( )

Knowledge map

Save

Ancient Chinese is a core course in the Chinese language and literature program. However, its intended learning outcomes with the existing textbooks are hard to evaluate, as the article selection, language point analysis, and overall arrangement are mostly based on editors' subjective experience. In order to quantify the expected learning outcomes, we put forward a novel approach based on a teaching-oriented lexical corpus of pre-Qin classics, which takes the frequency, importance, and semantic evolution of words into consideration. Then a case study is carried out by comparing two representative textbooks, Ancient Chinese (by Prof. Wang Li, editor in chief) and An Ancient Chinese Reader (2nd edition, by Prof. Wang Shuo), in regard to text length, language point density, distribution of new language points, as well as the learning curve. The quantitative results support the traditional qualitative understanding of these two sets and prove that our approach is valid. In the end, the re-ordering of the articles in the textbooks is discussed and a new learning curve which better fits the principle of gradual improvement is obtained.

Please choose a citation manager

Content to export

2018 Volume 32 Issue 6 Published: 15 June 2018