Journal of Chinese Information Processing

Select

Survey

A Survey of Multimodal Information Processing Frontiers: Application, Fusion and Pre-training

WU Youzheng, LI Haoran, YAO Ting, HE Xiaodong

2022, 36(5): 1-20.

Abstract ( ) PDF ( )

Knowledge map

Save

Over the past decade, there has been a steady momentum of innovation and breakthroughs that convincingly push the limits of modeling single modality, e.g., vision, speech and language. Going beyond such research progresses made in single modality, the rise of multimodal social network, short video applications, video conferencing, live video streaming and digital human highly demands the development of multimodal intelligence and offers a fertile ground for multimodal analysis. This paper reviews recent multimodal applications that have attracted intensive attention in the field of natural language processing, and summarizes the mainstream multimodal fusion approaches from the perspectives of single modal representation, multimodal fusion stage, fusion network, fusion of unaligned modalities, and fusion of missing modalities. In addition, this paper elaborate the latest progresses of the vision-language pre-training.

Select

Best Paper: CCL2021

The Construction and Application of Ancient Chinese Corpus with Word Sense Annotation

SHU Lei, GUO Yiluan, WANG Huiping, ZHANG Xuetao , HU Renfen

2022, 36(5): 21-30.

Abstract ( ) PDF ( )

Knowledge map

Save

Due to the dominant monosyllabic words, polysemy is a challenge for modern people to understand the ancient Chinese. Based on the linguistic knowledge in traditional dictionaries, this paper designs the principles of semantic division of polysemous words in ancient Chinese, and categorizes the knowledge of popular monosyllabic words in ancient Chinese. With these guidelines, the annotated corpus has accumulated up to 38 700 sentences with more than1 176 000 Chinese characters. Experiments show that the accuracy of BERT based word sense disambiguation model trained on the corpus achieves about 80%. Furthermore, this paper explores the application of the corpus built and the technique of word sense disambiguation in the study of language ontology and dictionary compilation via diachronic evolution analysis of word meaning and the induction of sense families.

Select

Best Paper: CCL2021

Chinese Word-Formation Prediction Based on Lexical Level Embedding

ZHENG Hua, LIU Yang, YIN Yaqi, WANG Yue, DAI Damai

2022, 36(5): 31-40,66.

Abstract ( ) PDF ( )

Knowledge map

Save

As a paratactic language, Chinese word-formations designate how the formation components combine to form words and become the key to understand semantics. In Chinese Natural Language Processing, most existing works on word-formation prediction follow the coarse-grained syntactic labels and use inter-word features in the context, regardless of the inner-word features like morphemes and lexical semantics. In this paper, we follow the word-formation labels defined from the linguistic perspective and construct a formation-informed Chinese dataset. We then propose a Bi-LSTM-based model with self-attention to explore how the inner- and inter-word features influence the Chinese word-formation prediction. Experimental results show that our method achieves high accuracy (77.87%) and F1 score (78.36%) on the word-formation task. Comparative analyses further show that morphemes (as an inner-word feature) greatly improve the prediction results, whereas the context (as an inter-word feature) performs the worst and shows strong instability.

Select

Information Extraction and Text Mining

A Deep Distance Factorization Based Recommendation Algorithm

QIAN Mengwei, GUO Yi

2022, 36(5): 41-48.

Abstract ( ) PDF ( )

Knowledge map

Save

To deal with the issue that the dot product adopted in matrix factorization can’t accurately measure users’ preference for items, a deep distance factorization model for recommender system is proposed. Firstly, the user-item rating matrix is converted into a distance matrix instead of being directly decomposed. Next, the distance matrix is input into two deep neural networks by row and column, and the distance feature vectors of users and items are obtained. Then, the distance between the user and the item is calculated with the distance feature vectors of users and items, and the error between predicted distance value and real distance value is minimized through the designed loss function. Finally, the ratings are converted from the predicted distance values. Experiments on different datasets show that the proposed algorithm outperforms other algorithms on rating prediction task.

Select

Information Extraction and Text Mining

Open Relation Extraction Based on Unsupervised Ensemble Clustering

XIE Binhong, LI Yu, ZHAO Hongyan

2022, 36(5): 49-58.

Abstract ( ) PDF ( )

Knowledge map

Save

Open relation extraction (OpenRE) aims to extract relations for facts from open domain corpus. Most OpenRE methods are unsupervised methods to cluster semantically equivalent patterns into a relation cluster. To further improve the clustering performance, we proposed an unsupervised ensemble clustering framework(UEC), which combines unsupervised ensemble learning with iterative clustering algorithm based on information measurement to create high-quality labels. Such high-quality label can be used as supervised information to improve the feature learning and the clustering process to obtain better labels. Finally, through multiple iterative clustering, the relational types in the text can be effectively discovered. The experimental results on FewRel and NYT-FB datasets show that UEC is superior to other mainstream OpenRE models, with F1 score reaching 65.2% and 67.1%, respectively.

Select

Machine Reading Comprehension

BERT Based Question Answering of Gaokao Chinese Reading Comprehension

YANG Zhizhuo, HAN Hui, ZHANG Hu, QIAN Yili, LI Ru

2022, 36(5): 59-66.

Abstract ( ) PDF ( )

Knowledge map

Save

Reading comprehension Q&A of Chinese college entrance examination is much more difficult than general reading comprehension Q&A, and the training data in the task is relatively small, so the method based on deep learning can not achieve satisfactory results. To solve these problems, this paper proposes an answer candidate sentence extraction method in reading comprehension of college entrance examination based on BERT semantic representation. First, the improved MMR algorithm is used to filter the paragraphs, then the BERT model is applied to represent the sentences semantically, then the softMax classifier is used to extract the answer candidate sentences, and finally we sort the output of the BERT model by PageRank algorithm. The recall and accuracy of our method on Chinese reading comprehension question of Beijing college entrance examination in recent ten years are 61.2% and 50.1% respectively, which proves the effectiveness of our method.

Select

Machine Reading Comprehension

Deep Interactive Fusion Network for Multi-hop Reading Comprehension

ZHU Siqi, GUO Yi, WANG Yexiang

2022, 36(5): 67-75.

Abstract ( ) PDF ( )

Knowledge map

Save

Multi-hop reading comprehension requiring information from multiple documents has attracted much attention. However, the interaction between paragraphs is less addressed, no matter in the gold paragraph selection or in question answering. In this paper, we propose a multi-paragraph deep interactive fusion network for multi-hop reading comprehension. First, we filter out paragraphs irrelevant to the query to reduce the impact of distractors on model performance. Then, the selected documents are further input to a deep interactive fusion network to aggregate information from different paragraphs for the final answer. Experiment on HotpotQA dataset demonstrates that our model achieves the improvements of 18.5% according to EM and 18.47% according to F1-score compared with the baseline.

Select

Question Answering and Dialogue System

Zero Resource Online Update for Knowledge-Grounded Dialogue System

LIN Jiancheng, LIN Xiaochuan

2022, 36(5): 76-84,93.

Abstract ( ) PDF ( )

Knowledge map

Save

The knowledge-grounded dialogue systems are designed to use external knowledge and conversation contexts to generate responses that conform to objective facts. Its online update, which is seldom addressed, is challenged by the zero resource setting due to the high cost of labeling the dialogue corpus. This paper proposes a method to update the model parameter with zero resource setting via pseudo data. First of all, we design different pseudo data generation strategies for different scenarios. Verified on the KdConv dataset, the experimental results show that the proposed method is comparable to human annotated data in terms of knowledge utilization and topic relevance.

Select

Question Answering and Dialogue System

Towards Better Response Selection in Dialogue via Rich Historical Information

SI Bowen, KONG Fang

2022, 36(5): 85-93.

Abstract ( ) PDF ( )

Knowledge map

Save

Response selection for dialogue is a popular research issue in the field of NLP, which is aimed at selecting appropriate responses based on the existing dialogue. Existing researches are defected in two aspects: 1) insufficient utilization of the correlation between historical information and alternative responses and, 2) insufficient mining of potential semantic information in dialogue history. To deal with the first issue, this paper considers both the historical information and the alternative response information in the dialogue, by the cross-attention mechanism to effectively capture the relationship between them. For the second issue, this paper employs the multi-head self-attention mechanism to capture the latent semantic information of the conversation history from multiple different perspectives, and the highway network to effectively bridge a variety of information to ensure the integrity of the information. Experiments show the proposed method achieves a 88.66% R₁₀@1-score, a 90.06% R₁₀@2-score and a 95.15% R₁₀@5-score on the Ubuntu Corpus V1 dataset.

Select

Question Answering and Dialogue System

GES:Graph-based Evidence Selection for Multi-hop Question Generation

PANG Zexiong, ZHANG Qi

2022, 36(5): 94-101.

Abstract ( ) PDF ( )

Knowledge map

Save

Multi-hop Question Generation is the task of reasoning over disjoint pieces of information and then generating complex questions. For the given Q&A pair, the context contains a large number of redundant and irrelevant sentences, and most previous methods require annotated corpus to select supporting facts as input to generate corresponding questions. To address this problem, this paper proposes a Graph-based Evidence Selection network (GES) for deep question generation over documents. The proposed model selects informative sentences from disjointed paragraphs, which serves as an inductive bias to refine question generation. We also employ a straight-through estimator to train the model in an end-to-end manner. Experimental results on the HotpotQA dataset demonstrate that our proposed solution outperforms state-of-the-art methods by a significant margin.

Select

Question Answering and Dialogue System

A Two-Stage Dialogue Generation Model Based on Affective Variables

FENG Guangjing, LIU Zhen, LIU Tingting, XU Gen, ZHUANG Yin, WANG Yuanyi, CHAI Yanjie

2022, 36(5): 102-111.

Abstract ( ) PDF ( )

Knowledge map

Save

Emotional dialogue generation has become one of the popular topics in natural language processing. It can improve the interaction between human and computer, but existing affective dialogue generation models only use a single affective variable and is easy to generate boring responses. To ensure the response sentences are not only semantically correct but also diversified, a two-stage dialogue generation model is proposed in this paper. In the first stage, DialoGPT with its powerful language understanding capabilities are used to ensure that responses with correct semantics can be generated. Main emotional variables and mixed emotional variables are fused to be global emotional variables to deal with the boring response. In the second stage, the global emotional variable is used to rewrite the response generated in the first stage, so as to polish the statement. Experimental results show that the proposed model performs better on the Empathetic Dialogues dataset than the baseline models.

Select

Sentiment Analysis and Social Computing

Dynamic Fusion of Multi-modal Heterogeneous Data for Sentiment Analysis

DING Jian, YANG Liang, LIN Hongfei, WANG Jian

2022, 36(5): 112-124.

Abstract ( ) PDF ( )

Knowledge map

Save

In recent years, sentiment analysis has been extended to multi-modal data, and the dynamic instead of static interaction of the intra modality data is worth exploring. This paper proposes a dynamic fusion method for heterogeneous multi-modal emotional stream data to completely capture the interaction between modalities. And using multi-task learning strategy, the heterogeneous dynamic fusion network is combined with a single modality self-supervised learning network to obtain the consistency and difference characteristics of the modality. Experiments on the CMU-MOSI and CMU-MOSEI indicate the advantage of the proposed method over mainstream models, as well as its interpretability.

Select

Sentiment Analysis and Social Computing

A Weighted Dependency Tree Convolutional Networks for Aspect-Based Sentiment Analysis

YANG Chunxia, SONG Jinjian, YAO Sicheng

2022, 36(5): 125-132.

Abstract ( ) PDF ( )

Knowledge map

Save

For aspect-based sentiment analysis, existing rule-based dependency tree pruning methods have the problem of deleting some useful information. In addition, how to use the graph convolutional network to obtain the rich global information in the graph structure is also an important problem at present. For the first problem, we use the multi-head attention mechanism to automatically learn how to selectively focus on the structural information that is useful for the classification task, and transform the original dependency tree into a fully connected edge weighted graph.To solve the second problem, we paper introduces dense connections into the graph convolutional network, so that the graph convolutional network can capture rich local and global information. The experimental results on the three public datasets show that the accuracy and F₁ of the proposed model are both improved compared with the baseline model.

Select

Sentiment Analysis and Social Computing

Application of Modeling Multi-aspects Dependencies in Aspect-level Sentiment Classification

ZHANG Li, XIAO Zhiyong

2022, 36(5): 133-144.

Abstract ( ) PDF ( )

Knowledge map

Save

Aspect-level sentiment classification aims to accurately identify the emotional polarity of aspects in a sentence. In order to effectively model the dependencies between multi-aspects in one sentence, this paper proposes a graph convolution network (GCN) approach. First, the aspects are encoded with the context by attention mechanism. Then, the multi-aspects dependency graph is constructed from the dependency syntax tree, and GCN is applied on the graph to model the dependencies between multi-aspects in one sentence. Finally, sentiment classification is preformed using the aspect representation generated by the GCN. Experiments on the Restaurant and Laptop datasets of SemEval 2014 Task4 show that the proposed model achieves a significant improvement over the standard GCN models.

Select

Sentiment Analysis and Social Computing

Multi-modal Emotion Recognition Based on Multi-LSTMs Fusion

ZHANG Yawei, WU Liangqing, WANG Jingjing, LI Shoushan

2022, 36(5): 145-152.

Abstract ( ) PDF ( )

Knowledge map

Save

Sentiment analysis is a popular research issue in the field of natural language processing, and multimodal sentiment analysis is the current challenge in this task. Existing studies are defected in capturing context information and combining information streams of different models. This paper proposes a novel multi-LSTMs Fusion Model Network (MLFN), which performs deep fusion between the three modalities of text, voice and image via the internal feature extraction layer for single-modal, and the inter-modal fusion layer for dual-modal and tri-modal. This hierarchical LSTM framework takes into account the information features inside the modal while capturing the interaction between the modals. Experimental results show that the proposed method can better integrate multi-modal information, and significantly improve the accuracy of multi-modal emotion recognition.

Select

Sentiment Analysis and Social Computing

Sen-BiGAT-Inter: A Method for Emotion-Cause Pair Extraction

FENG Haojia, LI Yang, WANG Suge, FU Yujie, MU Yongli

2022, 36(5): 153-162.

Abstract ( ) PDF ( )

Knowledge map

Save

Emotion-cause pair extraction is to extract both emotion clause and cause clause at the same time. For this task, the existing method of a single graph attention network does not consider emphasize the semantic representation of emotion words in the encoding layer. This paper proposes a Sen-BiGAT-Inter method using sentiment lexicon, graph network and multi-attention. The proposed method uses the sentiment lexicon to merge this clause with the emotion words in the clause, and uses the pre-training model BERT (Bidirectional Encoder Representation from Transformers) to obtain the clause representation. Then, we build two graph attention networks to learn the representation of emotion clause and cause clause, respectively, and then obtain the representation of candidate emotion-cause pair. On this basis, we get the emotion-cause pair with causality by using multi-head attention to learn the global information of candidate sentence pairs, and combing the relative position information to get the final representation of pairs. The experimental results on Chinese emotion-cause pair extraction dataset show the proposed model improves the F₁ value by about 1.95 compared with the current optimal results.

Select

Sentiment Analysis and Social Computing

Modeling External Factors for Information Cascade Prediction via Graph Attention Network

YANG Caipiao, BAO Peng, LI Xuanya

2022, 36(5): 163-172.

Abstract ( ) PDF ( )

Knowledge map

Save

The current information cascade prediction methods ignore the evolution change of the diffusion cascade and the individual's behavior preferences under the influence of external factors, as well as the graph structure of the social network. To address these issues, this paper proposes a method of modeling external factors in information diffusion based on graph attention network. The model applies graph attention mechanism to extract the underlying structure information in the social graphs. The convolutional neural networks are adopted to analyze the temporal information in the diffusion cascade and capture the external influence. A recurrent neural network is employed to model the diffusion path. Finally, the model utilizes different individual responses to the same external factors to predict the next node in the cascade. Experimental results on three real-world datasets from Twitter, Douban, and Memetracker show that the proposed model outperforms the state-of-the-art methods.

Please choose a citation manager

Content to export

2022 Volume 36 Issue 5 Published: 17 June 2022