汉语词语离合现象识别研究

周露,曲维光,魏庭新,周俊生,李斌,顾彦慧

PDF(2657 KB)
PDF(2657 KB)
中文信息学报 ›› 2023, Vol. 37 ›› Issue (6) : 25-32.
语言分析与计算

汉语词语离合现象识别研究

  • 周露1,曲维光1,2,4,魏庭新3,周俊生1,李斌2,顾彦慧1
作者信息 +

Separated Word Recognition in Chinese

  • ZHOU Lu1, QU Weiguang1,2,4, WEI Tingxin3, ZHOU Junsheng1, LI Bin2, GU Yanhui1
Author information +
History +

摘要

离合现象是指汉语中一种词语的前后语素之间可以插入其他成分,但分离后表达的意思仍然是一个整体的现象。该文采用字符级序列标注方法解决二字动词离合现象的自动识别问题,以避免自动分词及词性标注的错误传递;引入掩码机制,遮蔽句中离合词,以强化对中间插入成分的学习,并对前后语素采用不同的掩码以强调其出现顺序;设计双编码模型,对原始句子与掩码后的句子分别进行编码。实验结果表明,该文提出的BERT_MASK + 2BiLSTMs + CRF模型比当前性能最优的离合词识别模型提高了2.85%的F1值。

Abstract

The separated words, as a unique grammatical phenomenon in Chinese, refers to the components of words can be divided by other grammatical constituents without changing the word meaning. This paper proposes a character-level sequence tagging method for automatic recognition of separated two-character verbs via the MASK mechanism. We mask the separated words, and assign different embeddings to the first and second components of a word to emphasize the morpheme sequence order. A double coding model is designed to encode the original sentence and the masked sentence, respectively. Experimental results show that the proposed BERT_MASK + 2BiLSTMs + CRF model improves the F1-value by 2.85% compared with the state-of-the-art model.

关键词

离合词 / 自动识别 / 掩码机制 / 神经网络

Key words

separated words / automatic recognition / MASK mechanism / neural network

引用本文

导出引用
周露,曲维光,魏庭新,周俊生,李斌,顾彦慧. 汉语词语离合现象识别研究. 中文信息学报. 2023, 37(6): 25-32
ZHOU Lu, QU Weiguang, WEI Tingxin, ZHOU Junsheng, LI Bin, GU Yanhui. Separated Word Recognition in Chinese. Journal of Chinese Information Processing. 2023, 37(6): 25-32

参考文献

[1] 王燕明.对外汉语离合词扩展形式及教学[D]. 北京: 北京语言大学硕士学位论文,2009.
[2] 王海峰.基于语料库的现代汉语离合词语义特征考察[J].河北师范大学学报(哲学社会科学版),2010,33(01): 96-100.
[3] 王海峰.离合词离析结构句的元语用功能考察[J].汉字文化,2012(06): 10-15.
[4] 王薏淼.试论汉语词汇中的离合词现象[D].哈尔滨: 黑龙江大学硕士学位论文, 2013.
[5] LIU J, CHEN Y, ZHAO J. Knowledge enhanced event causality identification with mention masking generalizations[C]//Proceedings of the 29th International Joint Conference on Artificial Intelligence. Yokohama: International Joint Conferences on Artificial Intelligence, 2020: 3608-3614.
[6] 赵聿夕.面向应用的汉语离合词识别[D].南京: 南京师范大学硕士学位论文,2019.
[7] 周卫华,胡家全.中文信息处理中离合词的处理策略[J].三峡大学学报(人文社会科学版),2010,32(06): 39-41.
[8] 俞士汶,段慧明,朱学锋,等.北京大学现代汉语语料库基本加工规范[J].中文信息学报,2002,16(5): 49-64.
[9] 曲维光,周俊生,吴晓东,等.自然语言句子抽象语义表示AMR研究综述[J].数据采集与处理,2017,32(01): 26-36.
[10] 戴茹冰,侍冰清,李斌,等.基于AMR语料库的汉语省略与论元共享现象考察[J].外语研究,2020,37(02): 16-23.
[11] 刘博.基于语料库的离合词扩展形式自动识别研究[D].保定: 河北大学硕士学位论文,2015.
[12] 臧娇娇,荀恩东.基于BCC的离合词离析形式自动识别研究[J].中文信息学报,2017,31(01): 75-83.
[13] 张振景,李新福,田学东,等.基于SVM的离合词词义消歧[J].计算机科学,2016,43(02): 239-244.
[14] 周卫华.现代汉语离合词的扩展形式及特点[J].三峡论坛(三峡文学·理论版),2010(06): 123-127.
[15] DEVLIN J, CHANG M, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis: Association for Computational Linguistics, 2019: 4171-4186.
[16] GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural Networks,2005,18(5-6): 602-610.
[17] KINGMA D, BA J. Adam: A method for stochastic optimization[C]//Proceedings of ICLR, 2015: 1-15.
[18] ZHANG S,WANG L,SUN K,et al.A practical Chinese dependency parser based on a large-scale dataset[J/OL]. arXiv preprint arXiv: 1611.01734,2020.

基金

国家社会科学基金(21&ZD288)
PDF(2657 KB)

745

Accesses

0

Citation

Detail

段落导航
相关文章

/