一种融合注意力机制的自适应实体识别方法

陈启丽,黄冠和,王元卓,张琨,杜则尧

PDF(2026 KB)
PDF(2026 KB)
中文信息学报 ›› 2021, Vol. 35 ›› Issue (6) : 55-62,73.
信息抽取与文本挖掘

一种融合注意力机制的自适应实体识别方法

  • 陈启丽1,黄冠和1,2,王元卓2,张琨2,杜则尧3
作者信息 +

An Adaptive Entity Recognition Method with Attention Mechanism

  • CHEN Qili1, HUANG Guanhe1,2, WANG Yuanzhuo2, ZHANG Kun2, DU Zeyao3
Author information +
History +

摘要

为了解决命名实体识别任务在面向新兴应用领域时,需要面对烦琐的模型重构过程和语料严重不足的问题,该文提出了一种基于注意力机制的领域自适应命名实体识别方法。首先,在通用领域数据集上构建了基于BERT(bidirectional encoder representations from transformers)预训练语言模型的双向长短时记忆条件随机场(BERT-BiLSTM-CRF)命名实体识别模型;接着,在古代汉语语料集上对原有模型进行微调的同时插入了基于注意力机制的自适应神经网络层;最后,在目标域内应用迁移学习方法训练模型进行对比实验。实验结果表明,自适应迁移学习方法减少了对目标域语料的依赖。该文提出的基于注意力机制的自适应神经网络模型相比通用域BERT-BiLSTM-CRF模型的命名实体识别结果F1值提高了4.31%,相比古代汉语域BERT-BiLSTM-CRF模型的命名实体识别结果F1值提高了2.46%,实验表明,该文方法能够提升源域模型迁移学习的效果,并完成跨领域命名实体识别模型的构建。

Abstract

To deal with model reconstruction process and the lack of training data for various domains in the task of named entity recognition, a domain adaptive named entity recognition method is proposed based on attention mechanism. Firstly, a bidirectional long-short term memory conditional random field named entity recognition model based on the BERT (BERT-BiLSTM-CRF)is constructed on the general dataset. Then, such-bulit model is fine-tuned using the ancient Chinese corpus, with an adaptive neural network layer based on the attention mechanism inserted. The comparison experiment is set with the model in the target domain and the existing transfer learning method. The experimental results show that the proposed model improves the F1 value by 4.31% compared with the generic domain BERT-BiLSTM-CRF model, by 2.46% compared with the same model trained only on the ancient Chinese domain corpus.

关键词

迁移学习 / 命名实体识别 / 古代汉语 / BERT模型

Key words

transfer learning / named entity recognition / ancient Chinese / BERT model

引用本文

导出引用
陈启丽,黄冠和,王元卓,张琨,杜则尧. 一种融合注意力机制的自适应实体识别方法. 中文信息学报. 2021, 35(6): 55-62,73
CHEN Qili, HUANG Guanhe, WANG Yuanzhuo, ZHANG Kun, DU Zeyao. An Adaptive Entity Recognition Method with Attention Mechanism. Journal of Chinese Information Processing. 2021, 35(6): 55-62,73

参考文献

[1] Manning C D, Surdeanu M, Bauer J, et al. The Stanford CoreNLP natural language processing toolkit[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations,2014: 55-60.
[2] Peters M, Ammar W, Bhagavatula C, et al. Semi-supervised sequence tagging with bidirectional language models[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada: Association for Computational Linguistics,2017: 1756-1765.
[3] 俞鸿魁,张华平,刘群,等.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006, 27(02): 87-94.
[4] 陈霄. 基于支持向量机的中文组织机构名识别[D].上海:上海交通大学硕士学位论文,2007.
[5] 胡文博,都云程,吕学强,等.基于多层条件随机场的中文命名实体识别[J].计算机工程与应用,2009,45(01): 163-165,227.
[6] 王志强. 基于条件随机域的中文NER研究[D].南京:南京理工大学硕士学位论文,2006.
[7] 张海楠, 伍大勇, 刘悦,等. 基于深度神经网络的中文命名实体识别[J]. 中文信息学报, 2017,31(4): 28-35.
[8] 冯艳红,于红,孙庚,孙娟娟. 基于BLSTM的命名实体识别方法[J]. 计算机科学, 2018, 45(2): 261-268.
[9] 顾孙炎. 基于深度神经网络的中文命名实体识别研究[D]. 南京:南京邮电大学硕士学位论文,2018.
[10] Turian J, Ratinov L, Bengio Y. Word representations: A simple and ageneral method for semi-supervised learning[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010: 384-394.
[11] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv: 1810.04805, 2018.
[12] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30: 5998-6008.
[13] Zhang Z, Han X, Liu Z, et al. ERNIE: Enhanced language representation with informative entities[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics,2019: 1441-1451.
[14] Daumé III H, Marcu D. Domain adaptation for statistical classifiers[J]. Journal of Artificial Intelligence Research, 2006, 26(1): 101-126.
[15] Mou L, Meng Z, Yan R, et al. How Transferable are neural networks in NLP applications?[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing,2016: 479-489.
[16] Yang Z, Salakhutdinov R, Cohen W W. Transfer learning for sequence tagging with hierarchical recurrent networks[J]. arXiv preprint arXiv: 1703.06345, 2017.
[17] Lin B Y, Lu W. Neural adaptation layers for cross-domain named entity recognition[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 2012-2022.
[18] 谢韬. 基于古文学的NER的研究与实现[D]. 北京:北京邮电大学硕士学位论文,2018.
[19] Lample G, Ballesteros M, Subramanian S, et al. Neural architectures for named entity recognition[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2016: 260-270.
[20] Viterbi A J . Error bounds for convolutional codes and an asymptotically optimum decoding algorithm[J]. IEEE Transactions on Information Theory, 1967, 13(2): 260-269.
[21] 马孟铖, 吐尔地·托合提. 基于条件随机场多特征融合的中文地名、机构名实体识别[J]. 现代计算机(专业版), 2019, (12): 13-17.
[22] Lee J Y, Dernoncourt F, Szolovits P. Transfer learning for named-entity recognition with neural networks[J]. arXiv preprint arXiv: 1705.06273, 2017.[23] Collobert R, Weston J. A unified architecture for natural language processing: deep neural networks with multitask learning[C]//Proceedings of the 25th International Conference on Machine Learning,2008:160-167.
[24] Yosinski J, Clune J, Bengio Y, et al. How transferable are features in deep neural networks?[J]. Advances in Neural Information Processing Systems, 2014, 27: 3320-3328.

基金

国家自然科学基金(U1836206);北京市教委科研计划(KM201811232016);中原千人计划——中原科技创新领军人才项目;北京市博士后创新研发项目(ZZ201965);朝阳区博士后创新研发项目(2019ZZ45);促进高校分类发展重点研究培育项目(2121YJPY211)
PDF(2026 KB)

2150

Accesses

0

Citation

Detail

段落导航
相关文章

/