Journal of Chinese Information Processing

Select

NLP Application

Automatic Grading of Chinese Text Reading Difficulty Based on Multiple Linguistic Features and Deep Features

CHENG Yong, XU Dekuan, DONG Jun

. 2020, 34(4): 101-110.

Abstract (903) PDF (2779)

Knowledge map

Save

Automatic grading of text reading difficulty is to automatically judge the difficulty level according to text features. In this paper, we propose a novel difficulty grading method based on multi-linguistic features and deep features. In this method, various linguistic features are taken into account from characters level, vocabulary level and sentences level, in terms of frequency, length, complexity, richness and coherence. On the other hand, this paper uses the BERT-based pre-trained neural network model to extract the deep features of text sentences. On this basis, an end-to-end neural network is constructed to fuse the multi-linguistic features and deep features. Our methods achieve good performance in automatic grading, outperforming the methods based on traditional linguistic features and on popular neural networks.

Select

NLP Application

Image Caption via Pivot Language

ZHANG Kai, LI Junhui, ZHOU Guodong

. 2019, 33(3): 110-117.

Abstract (660) PDF (869)

Knowledge map

Save

Due to the publically available large-scale image dataset with manually labeled English captions, most studies on image caption aim at generating captions in a single language (e.g., English). In this paper, we explore zero-resource image caption to generate Chinese captions via English as the pivot language. Specifically, we propose and compare two approaches by taking advantage of recent advances in neural machine translation. The first approach, called pipeline approach, first generates English caption for a given image and then translates the English caption into Chinese. The second approach, called building pseudo-training set approach, first translates all English captions in training sets and development set into Chinese to obtain image-Chinese caption datasets, and then directly train a model to generate Chinese caption for a given image. Experimental results show that the second approach, i.e., the character-based Chinese caption generation model on the pseudo-training set, is superior to the pipeline approach.

Select

NLP Application

A Case Study on Journey to the West Based on Sentiment Analysis

ZHANG Chenlin, WANG Mingwen, TAN Yiming, CHEN Zhiming, ZUO Jiali, LUO Yuansheng

. 2019, 33(3): 118-125,135.

Abstract (807) PDF (949)

Knowledge map

Save

As one of the Four Great Classical Novels, Journey to the West left lots of foreshadowing to interpret. In this paper, we conduct a case study on Monkey King by using sentiment analysis. We apply NLP technologies: automatic segmentation and sentiment lexicon collection to calculate the sentiment of Monkey King. By judging the changes of the sentiment of Monkey King before and after the episode of “Real and Fake Monkey King”, we finally proposed such points as: “Monkey King was not killed by Rulai, the supreme Buddha”, and he changed to bend to obey the authority after the episode. This paper made a tentative exploration on sentiment analysis for literary studies.

Select

NLP Application

Neural Network-Based Jiju Poetry Generation

LIANG Jiannan, SUN Maosong, YI Xiaoyuan, YANG Cheng, CHEN Huimin, LIU Zhenghao

. 2019, 33(3): 126-135.

Abstract (1665) PDF (1373)

Knowledge map

Save

Jiju poetry is a special kind of Chinese classical poetry in which each line is selected from existing poems respectively. As a form of art recreation, the reformed poem should not only obey the structural and phonological constraints, but also have an original theme, integrated content, and coherence. In this paper, we propose a novel automatic Jiju poetry generation model based on neural network. We apply Recurrent Neural Network (RNN) to learn the vector representation of each poetry line, then we investigate different methods to measure the context coherence of two lines. Both automatic and human evaluation results show that our model can generate high-quality Jiju poems, outperforming the baseline models significantly.

Select

NLP Application

Study on Automatic Judgment of Traffic Accidents

YIN Heju, ZAN Hongying, CHEN Junyi, ZHAI Xinli

. 2019, 33(3): 136-144.

Abstract (670) PDF (861)

Knowledge map

Save

This article investigates the automatic judgment on the “traffic accidents” in civil cases of the legal field. The 14 000 samples are collected from the “China Jadgment Document Network.” Three models are examined, i.e. SVM-based model, BI-GRU-based model, and Attention+BI-GRU-based model, to classify the cases from the “China Judgment Document Network” into four-class and eight-class, respectively. The experimental results show that: the Attention+BI-GRU top-ranked with 80.26% F1 in the first task, while the BI-GRU model 48.59% F1 in the latter.

Select

NLP Application

An Online Data Collecting Framework Via Game for Children Second Language Development

MA Weizhi, ZHANG Min, ZHANG Chenyu, LIU Yiqun, MA Shaoping

. 2018, 32(4): 137-144.

Abstract (570) PDF (763)

Knowledge map

Save

The language cognition research is often based on the dataset of children's first language vocabulary development, such as WordBank and other large-scale corpora. However, there is no large-scale second language vocabulary development dataset, and it is very difficult to collect a big dataset with traditional data collecting method. This limits the study of second language learning and the comparison of first language and second language learning. In this paper, we design a data collecting framework for children based on the idea of games with a purpose, to collect children's vocabulary development status and his/her attributes. We have implemented the second language vocabulary development collecting system for children English learning so far, and the system is conducting online data collection now.

Please choose a citation manager

Content to export