Best Paper: CCL2021
ZHENG Hua, LIU Yang, YIN Yaqi, WANG Yue, DAI Damai
.
2022, 36(5):
31-40,66.
As a paratactic language, Chinese word-formations designate how the formation components combine to form words and become the key to understand semantics. In Chinese Natural Language Processing, most existing works on word-formation prediction follow the coarse-grained syntactic labels and use inter-word features in the context, regardless of the inner-word features like morphemes and lexical semantics. In this paper, we follow the word-formation labels defined from the linguistic perspective and construct a formation-informed Chinese dataset. We then propose a Bi-LSTM-based model with self-attention to explore how the inner- and inter-word features influence the Chinese word-formation prediction. Experimental results show that our method achieves high accuracy (77.87%) and F1 score (78.36%) on the word-formation task. Comparative analyses further show that morphemes (as an inner-word feature) greatly improve the prediction results, whereas the context (as an inter-word feature) performs the worst and shows strong instability.