2019 Volume 33 Issue 4 Published: 19 April 2019
  

  • Select all
    |
    Language Analysis and Calculation
  • Language Analysis and Calculation
    WU Taizhong, GU Min, ZHOU Junsheng, QU Weiguang, LI Bin, GU Yanhui
    2019, 33(4): 1-11.
    Abstract ( ) PDF ( ) Knowledge map Save
    Abstract Meaning Representation (AMR) is a domain-independent sentence semantic representation method, which abstracts the semantics of a sentence into a single directed acyclic graph. AMR parsing aims at parsing sentences into corresponding AMR graphs. In this paper, a tentative study of Chinese AMR parsing is conducted based on Chinese AMR features and the transition-based neural network. An incremental Chinese AMR parsing baseline strategy utilizing transition-based decoding method is proposed. Then, semantic representation of dependency paths and context information are utilized to improve the proposed model. Finally, the concept recognition in AMR parsing is conducted by applying sequence labeling. Experiments demonstrate that the proposed model outperforms the baseline by yielding Smatch F1 of 0.61 on Chinese AMR Parsing.
  • Language Analysis and Calculation
    DIAO Yufeng, YANG Liang, LIN Hongfei, WU Di, FAN Xiaochao, XU Bo, XU Kan
    2019, 33(4): 12-19,28.
    Abstract ( ) PDF ( ) Knowledge map Save
    Homographic pun, as a common source of humor in jokes and other comedic word, is hard to detect and locate the homographic pun words. We design a series of latent semantic characteristics and corresponding features to detect homographic puns. Then, a semantic similarity matching algorithm is proposed to locate pun words based on the fusion of Word Embedding and Sysnet. Experiment results on SemEval 2017 Task 7 and Pun of the Day demonstrate the effectiveness of the proposed method.
  • Language Resources Construction
  • Language Resources Construction
    LIU Pengyuan, LIU Yujie
    2019, 33(4): 20-28.
    Abstract ( ) PDF ( ) Knowledge map Save
    As an important linguistic issue, the noun compound has arouse close attention in the NLP community recently. In English, a relatively large-scale noun compound semantic relation knowledge base has been established. To establish the similar Chinese resources, this paper tries to tag and analyze the basic compound nouns in the large-scale real corpus, and establishes the basic noun compound semantic relation hierarchy and the corresponding syntax and semantic knowledge base in Chinese. So far, the knowledge base contains 18 281 high-frequency basic noun compounds, each labeled with semantic relation, phrase structure and referential entity information. The two nouns in each noun compound are further annotated for the semantic category according to the SKCC of Peking University. Based on this knowledge base, we also provide preliminary statistics and analysis of syntactic and semantics of basic noun compounds.
  • Knowledge Representation and Acquisition
  • Knowledge Representation and Acquisition
    YE Zhonglin, ZHAO Haixing, ZHANG Ke, ZHU Yu
    2019, 33(4): 29-36.
    Abstract ( ) PDF ( ) Knowledge map Save
    Words, as the basic semantic unit in language models, are strongly related to the context words in the whole semantic space. Word representation learning aims at mapping the relationship between words and context words into a low dimensional vector space using the shallow neural network models. However, the existing word representation learning methods usually only consider the syntagmatic relations between words, without directly capturing the paradigmatic information. In this paper, a new word representation learning algorithm, DEWE, is proposed to integrate the semantic information of the word itself into the training of word representation. The structural and semantic generalization of the proposed word representation learning method is validated by 6 similarity evaluation datasets, with all results confirming the excellent performance of DEWE.
  • Knowledge Representation and Acquisition
    WANG Hengsheng, LIU Tong, REN Jin
    2019, 33(4): 37-47.
    Abstract ( ) PDF ( ) Knowledge map Save
    For the design of a specific application of natural language based dialog system, i.e. campus information inquiry system, this paper proposes a method of improving word embedding for the expressiveness of semantic meanings. In addition to employing the word contexts in the training of word embedding, the domain specific knowledge is also introduced into the model training to enhance the expressiveness of word embedding. The knowledge about the application is organized into an ontology which was incorporated into word embedding through multi-task training of neural network model adapted from skip-gram, which is both a kind of constraint and a kind of enhancement to the word embedding. Experiments show the validness of the proposed embedding.
  • Knowledge Representation and Acquisition
    FANG Fang, WANG Ya, WANG Shi, FU Jianhui, CAO Cungen
    2019, 33(4): 48-59.
    Abstract ( ) PDF ( ) Knowledge map Save
    Knowledge acquisition from texts is an important research of artificial intelligence. We present a method of knowledge acquisition from Chinese records of cyber attack events based on semantic grammar. Firstly, we introduce a framework of semantic taxonomy and description(FSTD) according to FrameNet, as an expansion to the taxonomy of basic sentence patterns in modern Chinese. Secondly, we focus on the design process about the "suffering" category in the semantic taxonomy, which is the most common in the Chinese records of cyber attack events. Then we apply the framework of semantic taxonomy and description to the cyber attack domain and build the cyber attack FSTD. We also introduce the problems encountered in the process of building the cyber attack FSTD, including the role determination of semantic grammar, compound sentence design, sentence analysis which contains “的是”, and predicate design. The experiments on a real corpus provided by a national security department shows that our method reaches a high accuracy.
  • Machine Translation
  • Machine Translation
    CAI Jia, WANG Xiangdong, TANG Lizhen, CUI Xiaojuan, LIU Hong, QIAN Yueliang
    2019, 33(4): 60-67.
    Abstract ( ) PDF ( ) Knowledge map Save
    The Chinese-Braille conversion can be applied to fields such as Braille publication, education for the blind, etc. This paper presents a deep learning solution to automatic Chinese-Braille conversion based on parallel corpora. A Bi-directional LSTM model is trained using segmented Chinese texts according to the Braille segmentation rules and achieves high accuracy of Braille word segmentation. In order to support the model training, this paper also presents a strategy of automatically generating a corpus from Chinese and braille texts with the same content, with alignments at article-level, sentence-level and word-level, totaling 270 000 sentences, 2.34 million Chinese characters, and 4.48 million Braille symbols. The experimental results show that the proposed method outperforms the existing models.
  • Other Language in/around China
  • Other Language in/around China
    SE Chajia, HUA Guocairang, CAI Rangjia, CI Zhenjiacuo, ROU Te
    2019, 33(4): 68-74.
    Abstract ( ) PDF ( ) Knowledge map Save
    In this paper, an end-to-end model based on attention is proposed to generate Tibetan poems. The method is built on an end-to-end style without involving manual feature engineering. Under the framework BiLSTM, Tibetan word embedding, attention mechanism and multi-task learning are introduced. The experimental results show that the proposed method reaches 59.27% BLEU score and 62.34% ROUGE value, respectively.
  • Information Extraction and Text Mining
  • Information Extraction and Text Mining
    WU Xiaolong, CAO Cungen
    2019, 33(4): 75-84.
    Abstract ( ) PDF ( ) Knowledge map Save
    Extracting knowledge from Web tables is an important way to obtain high-quality knowledge, which is of substantial significance in knowledge graph, Web mining, etc. In contrast to classical methods defected in depending on a good table structure or enough pre-existing knowledge, we propose a novel method of Web table knowledge extraction based on fast clustering with equivalent compression for large-scale Web tables. By making full use of the structural characteristics of tables, we obtain tables with similar structures in an unsupervised clustering manner, and then infer the semantic structure of similar tables for knowledge extraction. The results show that the proposed clustering algorithm decreases the clustering time of 5,000 tables from 72 hours to 20 minutes at the same level of clustering accuracy, and the accuracy of the knowledge triples obtained by table templates after table clustering indicates that our method is highly satisfactory.
  • Information Extraction and Text Mining
    QIN Yanxia, WANG zhongqing, ZHENG Dequan, ZHANG Min
    2019, 33(4): 85-92.
    Abstract ( ) PDF ( ) Knowledge map Save
    Neural network based feature learning methods had been proven to be effective in Chinese/English event detection task. This paper further explores character-word-level neural features on solving out-of-vocabulary phenomenon in Chinese event detection. Two neural network models are applied to learn word-level representation and character-level representation, respectively. Hybrid representation for each word is obtained by concatenating word-level and character-level representation. Experimental results show that the proposed hybrid representation-based neural Chinese event detection model outperforms state-of-the-art results by 2.5% on F1.
  • Information Extraction and Text Mining
    WANG Kaixiang, REN Ming
    2019, 33(4): 93-100.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper proposes a query based automatic text summarization method, which is targeted to meet users' information needs of news. It assigns the weight of the sentence according to the TF-IDF, the similarity of sentence to the query, and the time of the sentence indicating (with a bias favoring the recent news). Finally, the method of the Maximal Marginal Relevance is used to select the summary sentence. Compared with six existing methods, the method proposed in this paper is superior in terms of ROUGE.
  • Information Extraction and Text Mining
    LIU Maofu, QI Qiaosong, HU Huijun
    2019, 33(4): 101-108.
    Abstract ( ) PDF ( ) Knowledge map Save
    The football news is usually written by experts or journalists. This paper proposes a method of directly generating news from football live broadcast script, which is based on the convolution neural network and the structure of football news text. It can locate important events from multiple periods in the football match, and then extract relevant sentences to generate football news. Moreover, this method will also generate a brief summary to the match comments. The experimental results show that it is feasible to use the proposed method in this paper to generate news of football match from the live broadcast script.
  • Information Extraction and Text Mining
    ZHANG Xuan, LIANG Xun, LI Zhiyu, ZHANG Shusen, ZHAO Xiaolei
    2019, 33(4): 109-119.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper proposes a fiction character relationship recognition model based on the complex network analysis method. Taking the Jin Yong’s fourteen martial arts fictions as an example, a noise-reduction analysis framework on fiction social networks, a model of human intimacy assessment and relational discriminant are built, which construct a general model for identifying love relationships of the protagonists in a novel. Experiment results show that the proposed model bears high accuracy and efficiency. It is also revealed that a decreased sliding window would improve the recall rate without losing the accuracy before a certain threshold.
  • Question Answering, Dialogue System and Machine Reading Comprehension
  • Question Answering, Dialogue System and Machine Reading Comprehension
    JIANG Mingqi, SHEN Chenlin, LI Shoushan
    2019, 33(4): 120-126.
    Abstract ( ) PDF ( ) Knowledge map Save
    Attribute classification, as an essential to the task of aspect-based sentiment classification, aims at classifying the category of attribute automatically. In contrast to the existing studies for attribute classification in news and review texts, this paper is focuses on a question-answer (QA) text pair, and a novel approach called multi-dimension textual representation is proposed. Firstly, we segment the question text of a QA text pair into sentences. Then, we leverage LSTM models to encode each sentence in question text and the whole answer text. Finally, we leverage a CNN layer to extract important information in all sentences of question text and the whole answer text. Experiments demonstrate the effectiveness of our proposed approach.
  • Question Answering, Dialogue System and Machine Reading Comprehension
    WANG Zhenyu, XIE Yanlu, ZHANG Jinsong
    2019, 33(4): 127-134.
    Abstract ( ) PDF ( ) Knowledge map Save
    With the continuous development of automatic speech recognition, the pronunciation errors verification and evaluation of second language (L2) learners has become one of the most important research topics in computer assisted pronunciation training. To deal with the lack of labeled mispronunciation speech data, a method based on acoustic phone embedding and Siamese network is proposed in this paper. A pair of acoustic phone segments with a pair-wise label is used as a system input, and speech features are mapped to high level representation through neural network to differentiate different types of phones. The Siamese network is optimized by tell whether two output embeddings are from same type of phones or not. Results show that accuracy of Siamese network based on cosine hinge loss function achieves the best accuracy of 89.93%, and accuracy of diagnosis is 89.19% in pronunciation error verification task.
  • NLP Application
  • NLP Application
    XU Mingyue, JIANG Jie, LI Yi, QIU Hongbin
    2019, 33(4): 135-142.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper investigates the automatic evaluation of online written Chinese chapters for hard-pen writing practice via digital input devices such as PAD. Based on the time-point sets of handwriting, we first extract lines and words, and then calculate the line level, line spacing stability, line spacing uniformity, word spacing uniformity, and left alignment. Based on these characteristic parameters, an expert-driven heuristic is derived to generate the writing quality score. The experiments show that the system can provide a result relatively consistent to the subjective evaluations.