Journal of Chinese Information Processing

Select

Survey

A Survey on the Automatic Text Readability Measures

WU Siyuan, CAI Jianyong, YU Dong, JIANG Xin

2018, 32(12): 1-10.

Abstract ( ) PDF ( )

Knowledge map

Save

The concept of readability is originally proposed by educators to assist the selection of suitable reading materials for learners. This paper surveys the existing works on automatic text readability measures, and summarized three types of methods: formula-based method, classification method and ranking method. This paper also outlines the databases and the extracted features in the literature. And finally, the future developments of the automatic readability research is provided.

Select

Language Analysis and Calculation

On Semantic Knowledge Resources for Language Understanding and Reasoning

YUAN Yulin, LU Dawei

2018, 32(12): 11-23.

Abstract ( ) PDF ( )

Knowledge map

Save

This paper discusses how to use semantic resources to assist computer in semantic understanding and commonsense reasoning. Firstly, we point out that human beings live in a world with common sense and meaning, and that artificial intelligence robots are required to understand the meaning of natural language to make commonsense reasoning. Then, we briefly summarize the advantages and disadvantages of two approaches of natural language processing based on knowledge and statistics. Then, we explain that neither concepts nor language can be truly understood with statistical methods and Deep Learning can hardly account for any knowledge. The paper shows with specific cases that Information Dictionary of Notional Word has been equipped with semantic role information and syntactic configuration of the words, which can be employed in the knowledge graph and the content computing and served for the improvement of the artificial intelligence. As the "Qualia Role" describes the encyclopedic knowledge of nouns, it can be used to answer commonsense questions such as what it is (formal role), what it consists of (constitute role), what it is made of (material role), how it is created (agentive role), and what it is used for (telic role).

Select

Language Analysis and Calculation

A Study on Syntactic Network and Semantic Network

MA Dan, ZHAO Yiyi

2018, 32(12): 24-30.

Abstract ( ) PDF ( )

Knowledge map

Save

The network based depiction of language is a major linguistic research method. This paper presents a semantic network base on semantics via Cytoscape. The network includes the function words, and semantic links and syntactic links are both established. It is revealed that 1) the sematic net has a larger dimension and average minimum distance; 2) the syntactic net is more hierarchical with a less clustering coefficient; 3) the function word “de”, “and”, “ge” serve as local centres.

Select

Language Analysis and Calculation

Research on Non-projective Structure Based on the Chinese Abstract Meaning Representation Corpus

WEN Yuan, SONG Li, WU Taizhong, LI Bin, ZHOU Junsheng, QU Weiguang

2018, 32(12): 31-40.

Abstract ( ) PDF ( )

Knowledge map

Save

The non-projective structure refers to the phenomenon that the word nodes on the dependency tree are misplaced with different word sequence in the original sentence. It has not been discussed in Chinese, following only the projection principle in the construction of Chinese dependency corpus. In this paper, we construct a Chinese abstract meaning representation (AMR) corpus of 10 149 sentences, in which 31.62% sentences have non-projective structures. Then we distinguish the three main types of the non-projective structures, modal words, topicalization and the component separation. Finally, we provide the solutions for the structures in the AMR parsing.

Select

Language Analysis and Calculation

A Neural Transition-based Dependency Parsing Model with Fused Multi-feature Encoding

LIU Mingtong, ZHANG Yujie, XU Jin’an, CHEN Yufeng

2018, 32(12): 41-47.

Abstract ( ) PDF ( )

Knowledge map

Save

For neural networks based dependency parsing, this paper presents a novel architecture for transition-based dependency parsing leveraging fused multi-feature encoding. We model the stack states based on subtrees representations and encode structural dependency subtrees with TreeLSTM. Particularly, we propose a LSTM-based technique to encode the historical parsed dependency arcs and states as global features. Finally, based on fused multi-feature encoding, we combine the extracted local features and global features for parsing decision. Experiments on Chinese Penn TreeBank (CTB5) show that our parser reaches 87.8% (unlabeled) and 86.8% (labeled) attachment accuracy with a greedy strategy, which effectively improves neural transition-based dependency parsing.

Select

Language Analysis and Calculation

A Transition-based Framework for Chinese Discourse Structure Parsing

SUN Cheng, KONG Fang

2018, 32(12): 48-56.

Abstract ( ) PDF ( )

Knowledge map

Save

As a subtask of discourse analysis, generating a proper discourse structure is critical for discourse comprehension and downstream discourse applications. Based on Chinese discourse treebank annotated under connective-driven dependency tree schema, a complete Chinese discourse structure generating framework is proposed. A statistical result on Chinese discourse corpus is given along with an evaluation protocol to measure the performance of discourse parser. The effectiveness in encoding discourse substructure is also compared between different distributed representation approaches.

Select

Language Resources Construction

A Semi-automatic Construction Method of Semantic Knowledge Base of Aviation Terms

Wang Sibo, Wang Peiyan, Zhang Guiping

2018, 32(12): 57-66.

Abstract ( ) PDF ( )

Knowledge map

Save

Semantic knowledge base is a basic resource of natural language processing. The existing large-scale semantic knowledge base is basically generic knowledge base, lacking the domain specific semantic knowledge. This paper proposes a semi-automatic method of constructing the semantic knowledge base of aviation terms by HowNet. It consists of four key processes of construction, resulting altogether 2 000 descriptions of the term concept (DEF). Finally, the validity of the method is verified by comparing the term similarities obtained by manual annotation and those obtained according to the term DEF.

Select

Language Resources Construction

Multi-translation Based Chinese Paraphrase: Evaluation Metric and Corpus

RUAN Chong, SHI Wenxian, LI Yanhao, WENG Yijia, HU Junfeng

2018, 32(12): 67-73.

Abstract ( ) PDF ( )

Knowledge map

Save

Paraphrase corpus is fundamental to research in paraphrase phenomenon, while Chinese paraphrase corpus is hardly available in academia. In this paper, we collected multiple Chinese translations of the novel Jane Eyre, obtaining roughly 50 000 parallel paraphrasing sentences. Then, we managed to extract more than 9 000 pairs of lexical paraphrase knowledge. We further modified METEOR, an automatic machine translation evaluation metric, to better evaluate Chinese paraphrase quality and provided a Chinese paraphrase evaluation dataset. The close test proved a better quality of our mined knowledge than that of Tongyici Cilin.

Select

Ethnic Language Processing and Cross Language Processing

Tibetan Entity Relation Extraction Based on Joint Model

XIA Tianci, SUN Yuan

2018, 32(12): 76-83.

Abstract ( ) PDF ( )

Knowledge map

Save

Extracting the entities and the relationship between them from unstructured texts is a challenging issue. This paper applies the joint model in Tibetan to perform the entity identification and relation extraction at the same time. An end-to-end sequence labelling framework of BiLSTM is adopted, and the POS information is integrated to enhance the performance. It is also demonstrated that the character-level processing method is more effective in Tibetan than the word-level processing. The experimental results show that the method improves the accuracy by 30%～40%, compared the SVM and LR.

Select

Information Extraction and Text Mining

A Bilingual News Summarizationin Chinese and Vietnamese Based on Multiple Features

YE Lei, YU Zhengtao, GAO Shengxiang, LIU Shulong, ZHANG Yafei

2018, 32(12): 84-91.

Abstract ( ) PDF ( )

Knowledge map

Save

In order to generate a summary for a news event reported in both Chinese and Vietnamese, a multi-feature fusion method for bilingual news summarization is proposed. It employs the cross-lingual correlations between sentences in the news text. Firstly, this method analyzes the co-occurrence degree of news elements and the similarity between sentences. Then, these two features are integrated into an undirected graph and a ranking algorithm is used to sort sentences. Finally, important sentences are selected and the redundancy is removed to generate a summary. Experiment on the Chinese and Vietnamese bilingual news archive shows that the proposed method achieved good results.

Select

Information Extraction and Text Mining

Sequence to Sequence Model to Ellipsis Recovery for Chinese Short Text

ZHENG Jie, KONG Fang, ZHOU Guodong

2018, 32(12): 92-99.

Abstract ( ) PDF ( )

Knowledge map

Save

Ellipsis is a common linguistic phenomenon, which is ubiquitous especially in short texts such as QA and dialogue. This paper builds a sequence-to-sequence neural network model for short texts to identify and recover ellipsis. Various experiments are conducted on the collected and sorted short text corpus for QA and dialogue, demonstrating good performances of the proposed model ellipsis identification and recovery.

Select

Information Extraction and Text Mining

Pseudo Topic Analysis Based on Topic Network

YAN Rong, GAO Guanglai

2018, 32(12): 100-108.

Abstract ( ) PDF ( )

Knowledge map

Save

This paper proposed a novel pseudo topic analysis approach based on the community structure in the topic network and the relationships between the topics. It represents the text semantics from the perspective of network structure, which is a remedy to existing statistical topic modeling methods.

Select

Information Retrieval and Question Answering

End-to-end Dialogue Control for Utterances without Slot Values in Task-oriented Dialogue System

HUANG Qiangjia, HUANG Peijie, LI Yanghui, DU Zefeng

2018, 32(12): 109-117.

Abstract ( ) PDF ( )

Knowledge map

Save

The end-to-end dialogue control of utterances without slot values is a challenging issue. This paper proposes an end-to-end hybrid coding network that combines explicit utterance features and implicit context information to handle utterances without slot information. Specifically, on the basis of feature expressions extracted from the "explicit" dialogue sequence by the convolutional neural network (CNN), the system action classification model is further enriched by constructing and capturing the "implicit" background system context information in the dialogue sequence. Experiments on the task-oriented restricted domain Chinese SDS shows that, compared to the existing methods, the proposed method achieves significant improvements in both per-response accuracy and per-dialog accuracy.

Select

Sentiment Analysis and Social Computing

Cross-lingual Sentiment Classification Based on Auto-associative Memory and Convolutional Neural Networks

LIU Jiao, CUI Rongyi, ZHAO Yahui

2018, 32(12): 118-124.

Abstract ( ) PDF ( )

Knowledge map

Save

A cross-linguistic sentiment classification algorithm based on semantic fusion is proposed for product reviews. First, information of different languages is generated by the open-source tool Word2Vec in advance. Then, the auto-associative memory relationship is proposed to extract the cross-lingual document semantic, according to statistical relevance of word vector between different languages. Local perception and weight sharing techniques of convolutional neural networks are applied to amalgamate of complex semantic expression in auto-associative memory model, so as to generate the phrase features of different lengths. The dense combination of high-level semantic features is learned by deep neural network for all languages, which paves the way for classification predictions. It is demonstrated that, for positive and negative sentiment classification of cross-lingual sentiment corpus, the proposed model is much more effective than other existing algorithms

Select

Machine Translation

Self-attention Based Machine Reading Comprehension

ZHANG Haoyu, ZHANG Pengfei, LI Zhenzhen, TAN Qingping

2018, 32(12): 125-131.

Abstract ( ) PDF ( )

Knowledge map

Save

Machine reading comprehension has attracted concerns in the field of Natural Language Processing. To deal with the Chinese machine reading comprehension data set —DuReader, this paper presents an extractive language model called Mixed Model with multiple strategies including recurrent neural network, paragraph fusion and self-attention mechanism. The proposed method achieves a Rouge-L score of 54.2 and a Bleu-4 score of 49.14 on the DuReader test set.

Select

Machine Translation

Machine Comprehension Model on Deep Hierarchical Features

HUO Huan, WANG Zhongmeng

2018, 32(12): 132-142.

Abstract ( ) PDF ( )

Knowledge map

Save

For Chinese machine reading tasks of multi-passage continuous answer spans, this paper proposes a model based on deep hierarchical features to extract the three-level deep features in details, in snippets, and in full-texts. In this model, words represented by word vectors are encoded in a recurrent layer to obtain the detailed features. The snippets features are constructed through several convolution layers and highway layers. And the full-text features are extracted from candidate passages to perform the overall inspection. Finally, through these features, the passage where the answer is located and the answer spans within the passage is determined. Experimented on 2018 NLP Challenge on Machine Reading Comprehension, the proposed model achieves a Rouge-L score of 57.55 and a Bleu-4 score of 50.87.

Please choose a citation manager

Content to export

2018 Volume 32 Issue 12 Published: 17 December 2018