针对中文命名实体识别任务,基于字词图进行字词特征融合被证明是一类有效的性能提升方法。然而,在实际场景下,构建字词图所使用的外部词典与训练数据间在领域、表达方式等多方面存在不一致,导致引入的词与实体间存在不完全匹配问题。不完全匹配词指与实体间存在边界冲突或语义冲突的词,这些词会在模型识别实体边界与类型过程中引入噪声特征。针对此问题,该文提出了一种基于对比学习的中文命名实体识别方法,将冲突实例视为负例,并为边界冲突和语义冲突分别设计了对比学习模块。另外,该文提出了改进的折损InfoNCE函数,以提升语义对比模块区分相似标签的能力。实验表明,在四个中文命名实体识别公开数据集上,该文方法均达到了当前最优性能。
Abstract
For Chinese named entity recognition(NER), character features and word features are usually fused by word-character graph. In real scenario, however, there exists words which show boundary conflicts and semantic conflicts with entities, which will introduce noise during process of entity boundary recognition and type classification. To address this problem, this paper proposes a Chinese named entity recognition method based on contrastive learning. Specifically, the proposed model considers conflicting instance as negative instance and designs contrastive learning modules for boundary conflicts and semantic conflicts respectively. Besides, to improve the ability to differentiate semantic-similar labels of semantic contrastive module, this paper proposes Discounted InfoNCE. Results on four public datasets of Chinese named entity recognition show that our method achieves the best performance.
关键词
对比学习 /
命名实体识别 /
特征融合
{{custom_keyword}} /
Key words
contrastive learning /
named entity recognition /
feature fusion
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 宗成庆. 统计自然语言处理[M]. 2版. 北京: 清华大学出版社,2013: 150-152.
[2] DMITY Z, CHINATSU A, ANTHONY R. Kernel methods for relation extraction[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2002: 71-78.
[3] 宗成庆,夏睿,张家俊. 文本数据挖掘[M]. 北京: 清华大学出版社,2019.
[4] 邓依依,邬昌兴,魏永丰,等. 基于深度学习的命名实体识别综述[J]. 中文信息学报,2021,35(9): 30-45.
[5] MIKOLOV T, SUTSKEVER I, CHEN K,et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems, 2013: 3111-3119.
[6] LI Y, CHITICARIU L, REISS F, et al. Domain adaptation of rule-based annotators for named entity recognition tasks[C]//Procedings of EMNLP, 2010: 1002-1012.
[7] GAYEN V, SARKAR K. An HMM based named entity recognition system for Indian languages: The JU system at ICON 2013[J]. CoRR, 2014, abs/1405.7397.
[8] HUANG Z, XU W, YU K. Bidirectional lstm-crf. models for sequence tagging[J/OL]. arXiv preprint arXiv: 1508.01991, 2015.
[9] PENG N, DREDZE M. Named entity recognition for Chinese social media with jointly trained embeddings[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal: Association for Computational Linguistics, 2015: 548-554.
[10] ZHANG Y, YANG J. Chinese NER using lattice LSTM[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia: Association for Computational Linguistics, 2018: 1554-1564.
[11] LI X, YAN H, QIU X, et al. FLAT: Chinese NER. using flat-lattice transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics, 2020: 6836-6842.
[12] LIU W, FU X, ZHANG Y, et al. Lexicon enhanced chinese sequence labeling using bert adapter[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021:5 847-5858.
[13] GUI T, ZOU Y, ZHANG Q, et al. A lexicon-based graph neural network for Chinese NER[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Hong Kong, China: Association for Computational Linguistics, 2019: 1040-1050.
[14] SUI D, CHEN Y, LIU K, et al. Leverage lexical knowledge for Chinese named entity recognition via collaborative graph network[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Hong Kong, China: Association for Computational Linguistics, 2019: 3830-3840.
[15] CHEN T, SIMON K, MOHAMMAD N, et.al. A simple. framework for contrastive learning of visual representations[J/OL]. arXiv preprint arXiv: 2002.05709, 2020.
[16] HE K, FAN H, WU Y, et al. Momentum contrast for unsupervised visual representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 9726-9735.
[17] AARON V, LI Y, ORIOL V. Representation learning with contrastive predictive coding[J/OL]. arXiv preprint arXiv: 1807.03748, 2019.
[18] CHEN T, HUNG W, TSENG H, et al. Incremental. false negative detection for contrastive learning[J/OL]. arXiv preprint arXiv: 2106.03719, 2022.
[19] KALANTIDIS Y, SARIYIDIZ M B, PION N, et al. Hard negative mixing for contrastive learning[C]//Proceedings of the Neural Information Processing Systems, 2020.
[20] KHONSLA P, TETERWAK P, WANG C, et al. Supervised contrastive learning[C]//Proceedings of the Advances in Neural Information Processing Systems, Inc., 2020: 18661-18673.
[21] SNELL J, SWERSKY K, ZEMEL R. Prototypical Networks for Few-shot Learning[C]//Proceedings of the Advances in Neural Information Processing Systems, 2017.
[22] LEVOW G A. The third international Chinese language processing bakeoff: Word segmentation and named entity recognition[C]//Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing. Sydney, Australia: Association for Computational Linguistics, 2006: 108-117.
[23] WEISCHEDEL R, PALMER M, MARCUS M, et al. OntoNotes. Release 5.0[S]. Version 14.0. Raytheon BBN Technologies, 2013.
[24] YANG J, ZHANG Y, LIANG S. Subword encoding in lattice LSTM for Chinese word segmentation[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, Minnesota: Association for Computational Linguistics, 2019: 2720-2725.
[25] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, Minnesota: Association for Computational Linguistics, 2019: 4171-4186.
[26] LAFFERTY J D, MCCALLUM A, PEREIRA F C N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the ICML: San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2001: 282-289.
[27] CUI Y M, CHE W X, LIU T, et al. Pre-training. with whole word masking for Chinese BERT[J/OL]. arXiv preprint arXiv: 1906.08101, 2019.
[28] VELICKOVIC P, CUCURULL G, CASANOVA A, et.al. Graph attention networks[C]//Proceedings of the International Conference on Learning Representations, 2018.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(62106273)
{{custom_fund}}