Journal of Chinese Information Processing

Content of 专题:CCL2021优秀论文 in our journal

Published in last 1 year
In last 2 years
In last 3 years
All

Please wait a minute...

Select all

|

Select

Best Paper: CCL2021
The Construction and Application of Ancient Chinese Corpus with Word Sense Annotation

SHU Lei, GUO Yiluan, WANG Huiping, ZHANG Xuetao , HU Renfen

. 2022, 36(5): 21-30.

Abstract (582) PDF (939) Knowledge map Save

Due to the dominant monosyllabic words, polysemy is a challenge for modern people to understand the ancient Chinese. Based on the linguistic knowledge in traditional dictionaries, this paper designs the principles of semantic division of polysemous words in ancient Chinese, and categorizes the knowledge of popular monosyllabic words in ancient Chinese. With these guidelines, the annotated corpus has accumulated up to 38 700 sentences with more than1 176 000 Chinese characters. Experiments show that the accuracy of BERT based word sense disambiguation model trained on the corpus achieves about 80%. Furthermore, this paper explores the application of the corpus built and the technique of word sense disambiguation in the study of language ontology and dictionary compilation via diachronic evolution analysis of word meaning and the induction of sense families.
Select

Best Paper: CCL2021
Chinese Word-Formation Prediction Based on Lexical Level Embedding

ZHENG Hua, LIU Yang, YIN Yaqi, WANG Yue, DAI Damai

. 2022, 36(5): 31-40,66.

Abstract (512) PDF (631) Knowledge map Save

As a paratactic language, Chinese word-formations designate how the formation components combine to form words and become the key to understand semantics. In Chinese Natural Language Processing, most existing works on word-formation prediction follow the coarse-grained syntactic labels and use inter-word features in the context, regardless of the inner-word features like morphemes and lexical semantics. In this paper, we follow the word-formation labels defined from the linguistic perspective and construct a formation-informed Chinese dataset. We then propose a Bi-LSTM-based model with self-attention to explore how the inner- and inter-word features influence the Chinese word-formation prediction. Experimental results show that our method achieves high accuracy (77.87%) and F1 score (78.36%) on the word-formation task. Comparative analyses further show that morphemes (as an inner-word feature) greatly improve the prediction results, whereas the context (as an inter-word feature) performs the worst and shows strong instability.

page
Page 1
of 1
Total 2 records

Please choose a citation manager

Content to export