Journal of Chinese Information Processing

Select

Language Analysis and Calculation

Chinese AMR Parsing using Transition-based Neural Network

WU Taizhong, GU Min, ZHOU Junsheng, QU Weiguang, LI Bin, GU Yanhui

2019, 33(4): 1-11.

Abstract ( ) PDF ( )

Knowledge map

Save

Abstract Meaning Representation (AMR) is a domain-independent sentence semantic representation method, which abstracts the semantics of a sentence into a single directed acyclic graph. AMR parsing aims at parsing sentences into corresponding AMR graphs. In this paper, a tentative study of Chinese AMR parsing is conducted based on Chinese AMR features and the transition-based neural network. An incremental Chinese AMR parsing baseline strategy utilizing transition-based decoding method is proposed. Then, semantic representation of dependency paths and context information are utilized to improve the proposed model. Finally, the concept recognition in AMR parsing is conducted by applying sequence labeling. Experiments demonstrate that the proposed model outperforms the baseline by yielding Smatch F1 of 0.61 on Chinese AMR Parsing.

Select

Language Analysis and Calculation

Homographic Puns Detection and Puns Location Based on Latent Semantic Characteristics

DIAO Yufeng, YANG Liang, LIN Hongfei, WU Di, FAN Xiaochao, XU Bo, XU Kan

2019, 33(4): 12-19,28.

Abstract ( ) PDF ( )

Knowledge map

Save

Homographic pun, as a common source of humor in jokes and other comedic word, is hard to detect and locate the homographic pun words. We design a series of latent semantic characteristics and corresponding features to detect homographic puns. Then, a semantic similarity matching algorithm is proposed to locate pun words based on the fusion of Word Embedding and Sysnet. Experiment results on SemEval 2017 Task 7 and Pun of the Day demonstrate the effectiveness of the proposed method.

Select

Language Resources Construction

Semantic Relations Hierarchy and Knowledge Base forChinese Basic Noun Compounds

LIU Pengyuan, LIU Yujie

2019, 33(4): 20-28.

Abstract ( ) PDF ( )

Knowledge map

Save

As an important linguistic issue, the noun compound has arouse close attention in the NLP community recently. In English, a relatively large-scale noun compound semantic relation knowledge base has been established. To establish the similar Chinese resources, this paper tries to tag and analyze the basic compound nouns in the large-scale real corpus, and establishes the basic noun compound semantic relation hierarchy and the corresponding syntax and semantic knowledge base in Chinese. So far, the knowledge base contains 18 281 high-frequency basic noun compounds, each labeled with semantic relation, phrase structure and referential entity information. The two nouns in each noun compound are further annotated for the semantic category according to the SKCC of Peking University. Based on this knowledge base, we also provide preliminary statistics and analysis of syntactic and semantics of basic noun compounds.

Select

Knowledge Representation and Acquisition

Description Constrained Word Embedding

YE Zhonglin, ZHAO Haixing, ZHANG Ke, ZHU Yu

2019, 33(4): 29-36.

Abstract ( ) PDF ( )

Knowledge map

Save

Words, as the basic semantic unit in language models, are strongly related to the context words in the whole semantic space. Word representation learning aims at mapping the relationship between words and context words into a low dimensional vector space using the shallow neural network models. However, the existing word representation learning methods usually only consider the syntagmatic relations between words, without directly capturing the paradigmatic information. In this paper, a new word representation learning algorithm, DEWE, is proposed to integrate the semantic information of the word itself into the training of word representation. The structural and semantic generalization of the proposed word representation learning method is validated by 6 similarity evaluation datasets, with all results confirming the excellent performance of DEWE.

Select

Knowledge Representation and Acquisition

Constraint-enhanced Word Embedding Based on Domain Knowledge

WANG Hengsheng, LIU Tong, REN Jin

2019, 33(4): 37-47.

Abstract ( ) PDF ( )

Knowledge map

Save

For the design of a specific application of natural language based dialog system, i.e. campus information inquiry system, this paper proposes a method of improving word embedding for the expressiveness of semantic meanings. In addition to employing the word contexts in the training of word embedding, the domain specific knowledge is also introduced into the model training to enhance the expressiveness of word embedding. The knowledge about the application is organized into an ontology which was incorporated into word embedding through multi-task training of neural network model adapted from skip-gram, which is both a kind of constraint and a kind of enhancement to the word embedding. Experiments show the validness of the proposed embedding.

Select

Knowledge Representation and Acquisition

Knowledge Acquisition from Chinese Records of Cyber Attacks Based on a Framework of Semantic Taxonomy and Description

FANG Fang, WANG Ya, WANG Shi, FU Jianhui, CAO Cungen

2019, 33(4): 48-59.

Abstract ( ) PDF ( )

Knowledge map

Save

Knowledge acquisition from texts is an important research of artificial intelligence. We present a method of knowledge acquisition from Chinese records of cyber attack events based on semantic grammar. Firstly, we introduce a framework of semantic taxonomy and description(FSTD) according to FrameNet, as an expansion to the taxonomy of basic sentence patterns in modern Chinese. Secondly, we focus on the design process about the "suffering" category in the semantic taxonomy, which is the most common in the Chinese records of cyber attack events. Then we apply the framework of semantic taxonomy and description to the cyber attack domain and build the cyber attack FSTD. We also introduce the problems encountered in the process of building the cyber attack FSTD, including the role determination of semantic grammar, compound sentence design, sentence analysis which contains “的是”, and predicate design. The experiments on a real corpus provided by a national security department shows that our method reaches a high accuracy.

Select

Machine Translation

A Deep Learning Method for Chinese-Braille Conversion Based onParallel Corpora

CAI Jia, WANG Xiangdong, TANG Lizhen, CUI Xiaojuan, LIU Hong, QIAN Yueliang

2019, 33(4): 60-67.

Abstract ( ) PDF ( )

Knowledge map

Save

The Chinese-Braille conversion can be applied to fields such as Braille publication, education for the blind, etc. This paper presents a deep learning solution to automatic Chinese-Braille conversion based on parallel corpora. A Bi-directional LSTM model is trained using segmented Chinese texts according to the Braille segmentation rules and achieves high accuracy of Braille word segmentation. In order to support the model training, this paper also presents a strategy of automatically generating a corpus from Chinese and braille texts with the same content, with alignments at article-level, sentence-level and word-level, totaling 270 000 sentences, 2.34 million Chinese characters, and 4.48 million Braille symbols. The experimental results show that the proposed method outperforms the existing models.

Select

Other Language in/around China

Tibetan Poem Generation with Attention Based Encoder-Decoder Model

SE Chajia, HUA Guocairang, CAI Rangjia, CI Zhenjiacuo, ROU Te

2019, 33(4): 68-74.

Abstract ( ) PDF ( )

Knowledge map

Save

In this paper, an end-to-end model based on attention is proposed to generate Tibetan poems. The method is built on an end-to-end style without involving manual feature engineering. Under the framework BiLSTM, Tibetan word embedding, attention mechanism and multi-task learning are introduced. The experimental results show that the proposed method reaches 59.27% BLEU score and 62.34% ROUGE value, respectively.

Select

Information Extraction and Text Mining

Extracting Knowledge from Web Tables Based on Fast Clustering with Equivalent Compression

WU Xiaolong, CAO Cungen

2019, 33(4): 75-84.

Abstract ( ) PDF ( )

Knowledge map

Save

Extracting knowledge from Web tables is an important way to obtain high-quality knowledge, which is of substantial significance in knowledge graph, Web mining, etc. In contrast to classical methods defected in depending on a good table structure or enough pre-existing knowledge, we propose a novel method of Web table knowledge extraction based on fast clustering with equivalent compression for large-scale Web tables. By making full use of the structural characteristics of tables, we obtain tables with similar structures in an unsupervised clustering manner, and then infer the semantic structure of similar tables for knowledge extraction. The results show that the proposed clustering algorithm decreases the clustering time of 5,000 tables from 72 hours to 20 minutes at the same level of clustering accuracy, and the accuracy of the knowledge triples obtained by table templates after table clustering indicates that our method is highly satisfactory.

Select

Information Extraction and Text Mining

Hybrid Representation Based Chinese Event Detection

QIN Yanxia, WANG zhongqing, ZHENG Dequan, ZHANG Min

2019, 33(4): 85-92.

Abstract ( ) PDF ( )

Knowledge map

Save

Neural network based feature learning methods had been proven to be effective in Chinese/English event detection task. This paper further explores character-word-level neural features on solving out-of-vocabulary phenomenon in Chinese event detection. Two neural network models are applied to learn word-level representation and character-level representation, respectively. Hybrid representation for each word is obtained by concatenating word-level and character-level representation. Experimental results show that the proposed hybrid representation-based neural Chinese event detection model outperforms state-of-the-art results by 2.5% on F₁.

Select

Information Extraction and Text Mining

Query-based Multi-document Automatic Summarization of News

WANG Kaixiang, REN Ming

2019, 33(4): 93-100.

Abstract ( ) PDF ( )

Knowledge map

Save

This paper proposes a query based automatic text summarization method, which is targeted to meet users' information needs of news. It assigns the weight of the sentence according to the TF-IDF, the similarity of sentence to the query, and the time of the sentence indicating (with a bias favoring the recent news). Finally, the method of the Maximal Marginal Relevance is used to select the summary sentence. Compared with six existing methods, the method proposed in this paper is superior in terms of ROUGE.

Select

Information Extraction and Text Mining

A CNN Approach to Football News Generation Based on Discourse Structure

LIU Maofu, QI Qiaosong, HU Huijun

2019, 33(4): 101-108.

Abstract ( ) PDF ( )

Knowledge map

Save

The football news is usually written by experts or journalists. This paper proposes a method of directly generating news from football live broadcast script, which is based on the convolution neural network and the structure of football news text. It can locate important events from multiple periods in the football match, and then extract relevant sentences to generate football news. Moreover, this method will also generate a brief summary to the match comments. The experimental results show that it is feasible to use the proposed method in this paper to generate news of football match from the live broadcast script.

Select

Information Extraction and Text Mining

Identification and Analysis of Love Relationships of Protagonists in Jin Yong’s Fictions

ZHANG Xuan, LIANG Xun, LI Zhiyu, ZHANG Shusen, ZHAO Xiaolei

2019, 33(4): 109-119.

Abstract ( ) PDF ( )

Knowledge map

Save

This paper proposes a fiction character relationship recognition model based on the complex network analysis method. Taking the Jin Yong’s fourteen martial arts fictions as an example, a noise-reduction analysis framework on fiction social networks, a model of human intimacy assessment and relational discriminant are built, which construct a general model for identifying love relationships of the protagonists in a novel. Experiment results show that the proposed model bears high accuracy and efficiency. It is also revealed that a decreased sliding window would improve the recall rate without losing the accuracy before a certain threshold.

Select

Question Answering, Dialogue System and Machine Reading Comprehension

Attribute Classification for Question-Answer Texts

JIANG Mingqi, SHEN Chenlin, LI Shoushan

2019, 33(4): 120-126.

Abstract ( ) PDF ( )

Knowledge map

Save

Attribute classification, as an essential to the task of aspect-based sentiment classification, aims at classifying the category of attribute automatically. In contrast to the existing studies for attribute classification in news and review texts, this paper is focuses on a question-answer (QA) text pair, and a novel approach called multi-dimension textual representation is proposed. Firstly, we segment the question text of a QA text pair into sentences. Then, we leverage LSTM models to encode each sentence in question text and the whole answer text. Finally, we leverage a CNN layer to extract important information in all sentences of question text and the whole answer text. Experiments demonstrate the effectiveness of our proposed approach.

Select

Question Answering, Dialogue System and Machine Reading Comprehension

Non-native Mispronunciation Verification Using Acoustic Tonal Phone Embedding and Siamese Networks

WANG Zhenyu, XIE Yanlu, ZHANG Jinsong

2019, 33(4): 127-134.

Abstract ( ) PDF ( )

Knowledge map

Save

With the continuous development of automatic speech recognition, the pronunciation errors verification and evaluation of second language (L2) learners has become one of the most important research topics in computer assisted pronunciation training. To deal with the lack of labeled mispronunciation speech data, a method based on acoustic phone embedding and Siamese network is proposed in this paper. A pair of acoustic phone segments with a pair-wise label is used as a system input, and speech features are mapped to high level representation through neural network to differentiate different types of phones. The Siamese network is optimized by tell whether two output embeddings are from same type of phones or not. Results show that accuracy of Siamese network based on cosine hinge loss function achieves the best accuracy of 89.93%, and accuracy of diagnosis is 89.19% in pronunciation error verification task.

Select

NLP Application

Automatic Evaluation of Quality of Online Written Chinese Chapter

XU Mingyue, JIANG Jie, LI Yi, QIU Hongbin

2019, 33(4): 135-142.

Abstract ( ) PDF ( )

Knowledge map

Save

This paper investigates the automatic evaluation of online written Chinese chapters for hard-pen writing practice via digital input devices such as PAD. Based on the time-point sets of handwriting, we first extract lines and words, and then calculate the line level, line spacing stability, line spacing uniformity, word spacing uniformity, and left alignment. Based on these characteristic parameters, an expert-driven heuristic is derived to generate the writing quality score. The experiments show that the system can provide a result relatively consistent to the subjective evaluations.

Please choose a citation manager

Content to export

2019 Volume 33 Issue 4 Published: 19 April 2019