Highlights
Please wait a minute...
  • Select all
    |
  • Question-answering and Dialogue
    WANG Mengyu, YU Dingyao, YAN Rui, HU Wenpeng, ZHAO Dongyan
    . 2020, 34(8): 78-85.
    Multi-turn dialogue task requires the system to take care of context information while generating fluent answers. Recently, a large number of multi-turn dialogue models based on HRED(Hierarchical Recurrent Encoder-Decoder) model have been developed, reporting good results on some English dialogue datasets such as Movie-DiC. On a high-quality customer service dialogue corpus from real world to contestants released by Jingdong in 2018, this article investigates the performance of HRED model and explores possible improvements. It is revealed that the combination of the attention and ResNet mechanisms with HRED model can achieve significant improvements.
  • Survey
    FENG Yang, SHAO Chenze
    . 2020, 34(7): 1-18.
    Machine translation is a task which translates a source language into a target language of the equivalent meaning via a computer, which has become an important research direction in the field of natural language processing. Neural machine translation models, as the main stream in the reasearch community, can perform end-to-end translation from source language to target language. In this paper, we select several main research directions of neural machine translation, including model training, simultaneous translation, multi-modal translation, non-autoregressive translation, document-level translation, domain adaptation, multilingual translation, and briefly introduce the research progresses in these directions.
  • Survey
    WEI Zhongyu, FAN Zhihao, WANG Ruize, CHENG Yijing, ZHAO Wangrong, HUANG Xuanjing
    . 2020, 34(7): 19-29.
    In recent years, increasing attention has been attracted to the research field related to cross-modality, especially vision and language. This survey focuses on the task of image captioning and summarizes literatures from four aspects, including the overall architecture, some key questions for cross-modality research, the evaluation of image captioning and the state-of-the-art approaches to image captioning. In conclusion, we suggest three directions for future research, i.e., cross-modality representation, automatic evaluation metrics and diverse text generation.
  • Survey
    TU Kewei, LI Jun
    . 2020, 34(7): 30-41.
    Syntactic parsing aims to analyze an input sentence for its syntactic structure. It is one of the most classic tasks in natural language processing. Current researches of syntactic parsing are focused on improving the accuracy of syntactic parsers via automatic learning from data. This paper surveys recent developments in syntactic parsing, classifies and introduces the new approaches and new discoveries over the past year in three subareas (supervised parsing, unsupervised parsing, and cross-domain/cross-language parsing), and finally discusses the future perspective of syntactic parsing research.
  • Information Extraction and Text Mining
    NIE Jinran, WEI Jiaolong, TANG Zuping
    . 2020, 34(7): 79-88.
    As a controllable text generation task,text style transfer has attracted more and more attention in recent years. Based on the variational auto-encoder model, the content and style of source sentences are separated in the latent space through the adversarial training between the discriminator and the variational auto-encoder. Due to the defect in the method using fixed binary vector for the style representation, we proposed a more fine-grained joint representation method which combines the latent variable extracted from an independent encoder with a style label to improve the accuracy of style transferation. Experimental results show that the joint representation method achieves higher accuracy compared with two baseline models on Yelp, a common dataset in the style transfer field.
  • Language Resources Constraction
    GE Shili, SONG Rou
    . 2020, 34(6): 27-35.
    English-Chinese clause alignment corpus serves the study and application of grammatical structure correspondence between English and Chinese clauses. It is of great significance to linguistic theory and language translation (including human translation and machine translation). Previous work on grammar theory and corpus lacks sufficient research on definitions of clause and clause complex. It is theoretically defective and difficult to support the application of natural language processing. Firstly, this paper makes theoretical preparations for the construction of English-Chinese clause alignment corpus. Starting from the theory of Chinese clause complex put forward in recent years, this paper defines the concept of component sharing, and further defines English clause and clause complex based on naming sharing and quotation sharing, which endows clause and clause complex with integrity and unity. Based on the study, an English-Chinese clause alignment annotation system is designed, including English NT clause tagging and Chinese translation generation and combination. The corpus annotation shows that, at the clause complex level, the components involved by the structural transformation in English-Chinese translation can be limited to English clauses, and related naming and telling, without involving the internal structure of namings and tellings. Based on these works, the English-Chinese clause aligned corpus provides research samples for linguistic research, English-Chinese language comparison, and English-Chinese machine translation.
  • Language Analysis and Calculation
    DU Jiaju, QI Fanchao, SUN Maosong, LIU Zhiyuan
    . 2020, 34(5): 1-9.
    Sememes, defined as the minimum semantic units of human languages in linguistics, have been proven useful in many NLP tasks. Since manual construction and update of sememe knowledge bases (KBs) are costly, the task of automatic sememe prediction has been used to assist sememe annotation. In this paper, we explore the method of applying dictionary definitions to predicting sememes for unannotated words. We find that sememes of each word are usually semantically related to different words in its dictionary definition, and we name this matching relationship local semantic correspondence. Accordingly, we propose a Sememe Correspondence Pooling (SCorP) model which is able to capture this kind of matching to predict sememes. Evaluated on HowNet, our model is revealed with state-of-the-art performance, capable of properly learning local semantic correspondence between sememes and words in dictionary definitions.
  • Language Analysis and Calculation
    DAI Yuling, DAI Rubing, FENG Minxuan, LI Bin, QU Weiguang
    . 2020, 34(4): 21-29.
    Function words have rich grammatical meanings and are crucial to sentence comprehension. The existing linguistic researches on function words cannot be directly adopted by computational linguistics due to lack of formal representation. In this paper, to represent their syntactic and semantic information, we align words and conceptual relations in the abstract meaning representation (AMR) based on concept graphs, so that function words correspond to nodes or arcs between conceptual nodes. Then, 8,587 sentences from PEP primary school Chinese textbooks are selected for AMR annotation. Among the total 24,801 tokens of function words in this corpus, 58.80% are prepositions, conjunctions and structural auxiliaries which are correspond to relations between concepts, and 41.20% are modals and aspects which express concepts. This shows that AMR represents function words dynamically, providing better theory and resources for the syntactic and semantic analysis of whole sentences.
  • Machine Translation
    LI Peiyun, LI Maoxi, QIU Bailian, WANG Mingwen
    . 2020, 34(3): 56-63.
    The word embedding of BERT contains semantic, syntactic and context information, pre-trained for a various downstream tasks of natural language processing. We propose to introduce BERT into neural quality estimation of MT outputs by employing stacked BiLSTM (bidirectional long short-term memory), concatenated with the existing the quality estimation network at the output layer. The experiments on the CWMT18 datasets show that the quality estimation can be significantly improved by integrating upper and middle layers of the BERT, with the top-improvement brought by average pooling of the last four layers of the BERT. Further analysis reveals that the fluency in translation is better exploited by BERT in the MT quality estimation task.
  • Sentiment Analysis and Social Computing
    SONG Shuangyong, WANG Chao, CHEN Chenglong, ZHOU Wei, CHEN Haiqing
    . 2020, 34(2): 80-95.
    AliMe is a recently developed chatbot, focused on intelligent customer service domain. The emotion analysis technologies have been successfully utilized in many modules of AliMe. The technical details of those emotion analysis based modules are presented, including user emotion detection, user emotion comfort, emotional generative chatting, customer service quality control, session satisfaction prediction and intelligent entrance for manual customer service. Furthermore, some user interface examples of those emotional modules are also introduced to improve understanding of their effects.