汉语中的被动句根据有无被动标记词可分为有标记被动句和无标记被动句。由于其形态构成复杂多样,给自然语言理解带来很大困难,因此实现汉语被动句的自动识别对自然语言处理下游任务具有重要意义。该文构建了一个被动句语料库,提出了一个融合词性和动词论元框架信息的PC-BERT-CNN模型,对汉语被动句进行自动识别。实验结果表明,该文提出的模型能够准确地识别汉语被动句,其中有标记被动句识别F1值达到98.77%,无标记被动句识别F1值达到96.72%。
Abstract
Chinese passive sentences can be classified into marked and unmarked passive sentences based on the presence of passive markers. Due to their complex and diverse forms, they pose significant challenges to natural language understanding. Therefore, the automatic recognition of Chinese passive sentences is important for downstream tasks in natural language processing. In this paper, we construct a corpus specifically for passive sentences and propose a PC-BERT-CNN model that integrates part-of-speech and verb argument frame information to automatic Chinese passive sentence identification. Experiment results demonstrate the proposed model achieves 98.77% F1 score for marked passive sentence recognition, and 96.72% for unmarked passive sentence recognition.
关键词
汉语被动句 /
自动识别 /
特征融合 /
语料库
{{custom_keyword}} /
Key words
Chinese passive sentences /
automatic recognition /
feature fusion /
corpus
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 宋文辉,罗政静,于景超. 现代汉语被动句施事隐现的计量分析[J]. 中国语文,2007(2): 113-124.
[2] 邹丽玲. 英译汉视角下解析汉语无标记被动句的句法结构[J]. 外语学界,2016: 272-281.
[3] 王灿龙. 无标记被动句和动词的类[J]. 汉语学习,1998(5): 15-19.
[4] 王芸华. 被动句主语的语义角色考察[J]. 贺州学院学报,2014,30(2): 18-22.
[5] 汤敬安. 汉语无标记被动句与有标记被动句的认知辨析[J]. 云梦学刊,2016,37(6): 110-114.
[6] 李珊. 现代汉语被字句研究[M]. 北京: 北京大学出版社,1994.
[7] 乔莎莎. 有标记被动句研究[D]. 哈尔滨: 黑龙江大学,2015.
[8] 鞠彩萍. “遭”字句: 兼论被动标记词的界定与优胜劣汰[J]. 贵州大学学报(社会科学版),2007,25(1): 117-121.
[9] 李斌,闻媛,宋丽,等. 融合概念对齐信息的中文AMR语料库的构建[J]. 中文信息学报,2017,31(6): 93-102.
[10] CHE W, LI Z, LIU T. Ltp: A Chinese language technology platform[C]//Proceedings of the 23rd International Conference on Computational Linguistics, 2010: 13-16.
[11] KIM Y. Convolutional neural networks for sentence classification[J]. arXiv preprint arXiv:1408.5882, 2014.
[12] DEVLIN J, CHANG M W, LEE K, et al. Bert pretraining of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics Human Language Technologies Minneapolis: Association for Computational Linguistics,2019: 4171-4186.
[13] QIN Q, HU W, LIU B. Feature projection for improved text classification[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 8161-8171.
[14] 朱向其,张忠林,李林川,等. 基于改进词性信息和ACBiLSTM的短文本分类[J]. 计算机应用与软件,2021,38(12): 179-186.
[15] NGUYEN C, TRAN V, LE NGUYEN M. Enrichment of features for malware-related sentence classification using external knowledge[C]//Proceedings of the IEEE 33rd International Conference on Tools with Artificial Intelligence,2021: 1144-1148.
[16] XUE N, PALMER M. Automatic semantic role labeling for Chinese verbs[C]//Proceedings of the 19th International Joint Conference on Artificial Intelligence, 2005: 1160-1165.
[17] CHE W, FENG Y, QIN L, et al. N-LTP: A open-source neural Chinese language technology platform with pretrained models[J]. arXiv preprint arXiv:2009.11616, 2020.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家社会科学基金(21&ZD288)
{{custom_fund}}