Content of 自然语言处理 in our journal
  • Published in last 1 year
  • In last 2 years
  • In last 3 years
  • All

Please wait a minute...
  • Select all
    |
  • NLP Application
    CHENG Yong, XU Dekuan, DONG Jun
    . 2020, 34(4): 101-110.
    Automatic grading of text reading difficulty is to automatically judge the difficulty level according to text features. In this paper, we propose a novel difficulty grading method based on multi-linguistic features and deep features. In this method, various linguistic features are taken into account from characters level, vocabulary level and sentences level, in terms of frequency, length, complexity, richness and coherence. On the other hand, this paper uses the BERT-based pre-trained neural network model to extract the deep features of text sentences. On this basis, an end-to-end neural network is constructed to fuse the multi-linguistic features and deep features. Our methods achieve good performance in automatic grading, outperforming the methods based on traditional linguistic features and on popular neural networks.
  • NLP Application
    ZHANG Kai, LI Junhui, ZHOU Guodong
    . 2019, 33(3): 110-117.
    Due to the publically available large-scale image dataset with manually labeled English captions, most studies on image caption aim at generating captions in a single language (e.g., English). In this paper, we explore zero-resource image caption to generate Chinese captions via English as the pivot language. Specifically, we propose and compare two approaches by taking advantage of recent advances in neural machine translation. The first approach, called pipeline approach, first generates English caption for a given image and then translates the English caption into Chinese. The second approach, called building pseudo-training set approach, first translates all English captions in training sets and development set into Chinese to obtain image-Chinese caption datasets, and then directly train a model to generate Chinese caption for a given image. Experimental results show that the second approach, i.e., the character-based Chinese caption generation model on the pseudo-training set, is superior to the pipeline approach.
  • NLP Application
    ZHANG Chenlin, WANG Mingwen, TAN Yiming, CHEN Zhiming, ZUO Jiali, LUO Yuansheng
    . 2019, 33(3): 118-125,135.
    As one of the Four Great Classical Novels, Journey to the West left lots of foreshadowing to interpret. In this paper, we conduct a case study on Monkey King by using sentiment analysis. We apply NLP technologies: automatic segmentation and sentiment lexicon collection to calculate the sentiment of Monkey King. By judging the changes of the sentiment of Monkey King before and after the episode of “Real and Fake Monkey King”, we finally proposed such points as: “Monkey King was not killed by Rulai, the supreme Buddha”, and he changed to bend to obey the authority after the episode. This paper made a tentative exploration on sentiment analysis for literary studies.
  • NLP Application
    LIANG Jiannan, SUN Maosong, YI Xiaoyuan, YANG Cheng, CHEN Huimin, LIU Zhenghao
    . 2019, 33(3): 126-135.
    Jiju poetry is a special kind of Chinese classical poetry in which each line is selected from existing poems respectively. As a form of art recreation, the reformed poem should not only obey the structural and phonological constraints, but also have an original theme, integrated content, and coherence. In this paper, we propose a novel automatic Jiju poetry generation model based on neural network. We apply Recurrent Neural Network (RNN) to learn the vector representation of each poetry line, then we investigate different methods to measure the context coherence of two lines. Both automatic and human evaluation results show that our model can generate high-quality Jiju poems, outperforming the baseline models significantly.
  • NLP Application
    YIN Heju, ZAN Hongying, CHEN Junyi, ZHAI Xinli
    . 2019, 33(3): 136-144.
    This article investigates the automatic judgment on the “traffic accidents” in civil cases of the legal field. The 14 000 samples are collected from the “China Jadgment Document Network.” Three models are examined, i.e. SVM-based model, BI-GRU-based model, and Attention+BI-GRU-based model, to classify the cases from the “China Judgment Document Network” into four-class and eight-class, respectively. The experimental results show that: the Attention+BI-GRU top-ranked with 80.26% F1 in the first task, while the BI-GRU model 48.59% F1 in the latter.
  • NLP Application
    MA Weizhi, ZHANG Min, ZHANG Chenyu, LIU Yiqun, MA Shaoping
    . 2018, 32(4): 137-144.
    The language cognition research is often based on the dataset of children's first language vocabulary development, such as WordBank and other large-scale corpora. However, there is no large-scale second language vocabulary development dataset, and it is very difficult to collect a big dataset with traditional data collecting method. This limits the study of second language learning and the comparison of first language and second language learning. In this paper, we design a data collecting framework for children based on the idea of games with a purpose, to collect children's vocabulary development status and his/her attributes. We have implemented the second language vocabulary development collecting system for children English learning so far, and the system is conducting online data collection now.