2019 Volume 33 Issue 11 Published: 11 November 2019
  

  • Select all
    |
    Survey
  • Survey
    LIN Qian, LIU Qing, SU Jinsong, LIN Huan, YANG Jing, LUO Bin
    2019, 33(11): 1-14.
    Abstract ( ) PDF ( ) Knowledge map Save
    Machine translation is the process of to attempting convert text from one language to another using computers, which has already become the research issues of great importance in artificial intelligence. With the fast growth of deep learning research and application, it has been revealed that neural machine translation become a mainstream of research for machine translation. This paper firstly introduces the influence of neural machine translation in academia and industry in the past year, and then reviews the research progress on neural machine translation, finally we outline the outlook for its future development.
  • Survey
    ZHANG Chenxin, RAO Yuan, FAN Xiaobing, WANG Shuo
    2019, 33(11): 15-30.
    Abstract ( ) PDF ( ) Knowledge map Save
    The event summarization technology based on social media plays an important role in the study of emergency detection, event trend analysis, public opinion analysis and many other aspects. Based on a large number of latest research, this paper summarizes the key technologies in the core steps from the perspective of the realization of event summarization,and puts forward the following four key technical problems and challenges in the process of event context mining and analysis: how to generate event summarization under multimodal information fusion; how to mine event and generate event summarization under cross-media heterogeneous data collaboration; how to map relationship hierarchically and at multi-granularity of complex events and how to recognize event and generate event summarization under real-time conditions. Meanwhile, this paper discuss the related theories, research progresses and research trend, which can provide new research clues and directions for event summarization mining technology based on social media.
  • Language Analysis and Calculation
  • Language Analysis and Calculation
    RAO Gaoqi, LI Yuming
    2019, 33(11): 31-38.
    Abstract ( ) PDF ( ) Knowledge map Save
    In the evolution of the Chinese Language, the use of words is significantly affected by time, resulting the various diachronic distributions of lexicon. In this paper, we employ TF/IDF to hierarchically classify the lexicon of 70-year corpus according to the diachronic distribution. Diachronic text classification, distribution of part of speech and word length, corpus coverage, and distribution of usage over time are analyzed, upon which we propose a diachronic hierarchy division of the Chinese lexicon.
  • Language Analysis and Calculation
    WANG Xingjin, ZHOU Lanjiang, ZHANG Jianan, ZHOU Feng
    2019, 33(11): 39-45.
    Abstract ( ) PDF ( ) Knowledge map Save
    At present, the research on Lao part-of-speech tagging is in its infancy, with limited tagged corpus available. In particular, Lao has absorbed a variety of foreign words, resulting in the presence of a large number of rare words. This paper studies the structure characteristics of Lao words and proposes a multi-task Lao part-of-speech tagging model with a combination of part-of-speech tagging loss with the main consonant auxiliary loss. To capture the rich affixes indicating part of speech clues in Lao, the model also uses character-level word vectors. In addition, the attention mechanism is employed to deal with the long sentence pattern of Lao. The experimental results show that the proposed method achieves better accuracy of 93.24%.
  • Language Analysis and Calculation
    WANG Rui, LI Bicheng, DU Wenqian
    2019, 33(11): 46-56.
    Abstract ( ) PDF ( ) Knowledge map Save
    To employ both the global and the local features of the entity, an entity disambiguation method based on context word vector and topic model is proposed. Firstly, the context direction vector is added to the traditional word vector model to represent the word order, and the model is used to train the topic vector based on topic model. Secondly, the entity context similarity, the category topic similarity based on the entity topic and the entity theme similarity based on the topic vector are calculated, respectively. Finally, the three similarities are merged, and the entity with the highest similarity is taken as the target entity. The experimental results show that the new method is effective compared to state-of-the-art methods.
  • Language Analysis and Calculation
    YU Jingsong, WEI Yi, ZHANG Yongwei
    2019, 33(11): 57-63.
    Abstract ( ) PDF ( ) Knowledge map Save
    Ancient Chinese differs from modern Chinese in words and grammar. Since there are no explicit marks among sentences in ancient Chinese texts, today's readers find it's hard to understand them. It is also difficult and requires expertise in a variety of fields to segment ancient text. We investigate to perform automatic texts segmentation and punctuation based on recent deep learning technologies. By pre-training a BERT (Bidirectional Encoder Representations from Transformers) model for ancient Chinese texts ourselves, we get the current state-of-the-art results on both tasks via fine-tuning. Comparing to traditional statistical methods and current BiLSTM+CRF solution, our approach significantly outperforms them by achieving F1-scores of 89.97% and 91.67% on small-scaled single category corpus and large-scaled multi-categories corpus,respectively. Especially, our approach shows its good generalization ability by achieving F1-score of 88.76% on a fully new Taoist corpus. On the punctuation task, our method F1 score reached 70.40%, which exceeded the baseline BiLSTM+CRF model by 12.15%.
  • Knowledge Representation and Acquisition
  • Knowledge Representation and Acquisition
    FENG Xiaolan, ZHAO Xiaobing
    2019, 33(11): 64-72.
    Abstract ( ) PDF ( ) Knowledge map Save
    Tourism is one of the main economic sources in the Tibetan region. However, there is no Tibetan tourism information intelligent service system on the Internet, and the introduction text of Tibetan attractions is also rare. In contrast, Chinese tourism websites have a large amount of information and contain different attractions. To facilitate the understanding of the knowledge related to the attraction, this paper firstly uses the BLSTM neural network model to acquire 11 kinds of attribute knowledge related to scenic spots in the Chinese tourism field. Through the Chinese-Tibetan dictionary of tourism, the Chinese knowledge acquired is transferred to Tibetan, and the translation coverage rate is 70.44%. Finally, a knowledge graph of Chinese-Tibetan bilingual tourism is constructed.
  • Knowledge Representation and Acquisition
    ZHU Yanli, YANG Xiaoping, WANG Liang, ZHANG Zhiyu
    2019, 33(11): 73-82.
    Abstract ( ) PDF ( ) Knowledge map Save
    Knowledge graph embedding maps entities and relations into low-dimensional vector spaces. Existing embedding representation methods have two major drawbacks in modeling knowledge graph with asymmetric characteristics. First, they do not consider asymmetry between head and tail entities, assuming that the head and tail entities in knowledge graphs come from the same semantic spaces. Second, they equip each relation with a set of unique projection matrices, ignoring the intrinsic correlations of relations, which hinder the sharing of knowledge between projection matrices and cause poor generalization ability. This paper proposes a novel embedding approach named Trans-RD to deal with the two issues above. TransRD adopts different projection matrices for head and tail entities respectively, and applies ADADELTA algorithm to adjust the learning rate adaptively. Then it uses the same pair of transfer matrices for similar relations to improve the performance of knowledge representation. Empirical results of link prediction based on WN18 and FB15K (public knowledge graph datasets) and MPBC_20 (a subset of Knowledge Graph of Breast Cancer) show that TransRD achieves remarkable improvement in various aspects compared to existing models.
  • Knowledge Representation and Acquisition
    ZHAO Yu, TAN Haining, LIU Zhifang, WU Chao
    2019, 33(11): 83-94.
    Abstract ( ) PDF ( ) Knowledge map Save
    Due to the abundant structural and semantic information in the heterogeneous information network as well as its wide application, network representation learning for heterogeneous information networks has become a vital research issue. The current representation learning models for heterogeneous information network can be divided into generative model based or discriminative model based methods. In this paper, we propose a representation learning model for the heterogeneous information network called HINGAN, which integrates the generative adversarial network into the representation learning process of heterogeneous information network to improve network representational outcomes. Firstly, this model builds a weighted homogeneous information network in the guidance of the meta-path. Then, by employing the GAN for the greatest gain, it updates parameters of the constructed generator and discriminator on the weighted network. According to the experimental results on AMiner and DBLP, HINGAN can get a more outstanding effect than present mainstream network representation methods from the aspects of multi-label classification and visualization. At the same time, HINGAN can be applied to extensively scalable representation and effective calculation of the heterogeneous network data.
  • Informaton Extraction and Text Mining
  • Informaton Extraction and Text Mining
    YIN Zhangzhi, LI Xinzi, HUANG Degen, LI Jiuyi
    2019, 33(11): 95-100,106.
    Abstract ( ) PDF ( ) Knowledge map Save
    Named Entity Recognition(NER) plays an important role in Natural Language Processing. In order to obtain better results without manual features, this paper proposes an NER method based on an ensemble model of BiLSTM. Firstly, we apply the BiLSTM-CRF training on the data, obtaining the character-based model Char-NER and the word-based model Word-NER respectively. Then the score vectors obtained by the two models are merged as the input to the SVM model. The experimental results show that this method achieves 94.04%, 92.15%, 87.05% and 91.73%, 93.20%, 83.15% F-Scores of name, location and organization on the 1998 people's daily and MSRA corpus respectively without hand-crafted features.
  • Informaton Extraction and Text Mining
    LIN Siqi, YU Zhengtao, GUO Junjun, GAO Shengxiang
    2019, 33(11): 101-106.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper proposes a Chinese-Vietnamese bilingual news perspective sentence extraction method that incorporates multiple features. Firstly, for the problem of unbalanced resources in Chinese and Vietnamese, this method constructs a Chinese-Vietnamese bilingual word embedding model. We use rich Chinese tag resources to make up for the lack of Vietnamese tagging resources. Then, the emotional, topical and positional features of sentences are integrated into the word vector and attention mechanism respectively. Experiments show that this method can effectively improve the accuracy of Vietnamese news perspective sentence extraction.
  • Informaton Extraction and Text Mining
    YIN Hong, CHEN Yan, LI Ping
    2019, 33(11): 107-114.
    Abstract ( ) PDF ( ) Knowledge map Save
    Key-phrase extraction aims to automatically identify important key-phrases from documents. Most existing methods are focused on the words' importance and the relation between words. Considering that key-phrase should closely related to the article's topics, we proposed an improved method based on topic entropy. Our work firstly use Latent Dirichlet Allocation to train the theme distribution of documents and words, and combine them to get the words' topic distribution of a specific document. Then words' topic entropy are worked out to represent the words' importance. Finally, we use random walk on words' co-occurrence graph to calculate the score of each candidate phrase. Experimental results show that proposed method has an improvement of 2.61%-6.98% in F1 score compared with the existing methods.
  • Informaton Extraction and Text Mining
    WEI Wancheng, HUANG Wenming, WANG Jing, DENG Zhenrong
    2019, 33(11): 115-124.
    Abstract ( ) PDF ( ) Knowledge map Save
    This paper proposes a novel multi-task learning model for the automatic generation of classical poetry and couplet, which uses an encoder-decoder structure and the attention mechanism. The encoder consists of two BiLSTMs, one for keyword input, the other for classical poetry and couplet input. The decoder consists of two LSTMs, one for classical poetry output, the other for couplet output. In the multi-task learning model, the encoder parameters are shared and the decoder parameters are not shared. The encoder of model can learn the common features of classical poetry and couplet, the decoder of classical model can learn the unique features of classical poetry and couplet. So, the generalization ability of the model will be enhanced, and the performance will be much better than the single task model. At the same time, this paper innovatively introduces keyword information in the model, so that the generated classical poetry and couplet are consistent with the user's intention. At the end of this paper, automa-tic evaluation and manual evaluation are used to verify the effectiveness of the method.
  • Information Retrieval and Question Answering
  • Information Retrieval and Question Answering
    ZHAO Chang, LI Huiying
    2019, 33(11): 125-133.
    Abstract ( ) PDF ( ) Knowledge map Save
    Entity linking for knowledge base question answering is to link the entity mention in the natural language question to a target entity in the knowledge base. This paper employs the candidate entity's types, relationships and neighboring entities as the candidate entity representation, so as to solve the problem of insufficient description information of the entity in the knowledge base. At the same time, the similar entity mentions obtained by training the corpus are considered as the mention's background knowledge. Finally, the proposed features combine the entity popularity feature to solve the entity disambiguation problem. The experimental results on the data set show that the linear combination of all the above-mentioned features is better than the single feature.
  • Information Retrieval and Question Answering
    TAN Hongye, WU Zepeng, LU Yu, DUAN Qinglong, LI Ru, ZHANG Hu
    2019, 33(11): 134-142.
    Abstract ( ) PDF ( ) Knowledge map Save
    Automatic short answer grading (ASAG) is a key issue in intelligent tutoring systems. The main challenges in ASAG lie in 1) the reference answer for a given question cannot cover the diverse student answers; and 2) the similarity between student answer and the reference is hard to estimate. This paper applies clustering and maximum similarity to select representative answers, constructing the reference answer set to cover various student answers. Then, this paper employs a deep neural network model based on the attention mechanism to approximate the similarity between the student answer and the reference answer set. Experimental results show that the proposed model effectively improves the accuracy of automatic scoring.