在中文命名实体识别领域,过去的工作侧重于通过外部词典来引入边界信息,从而在推理过程中能够处理未登录词。然而,现有方法使用基于统计的分词工具自动生成词典,分词质量较低,错误的分词结果为推理过程引入较多噪声,且更新词典意味着重新训练模型,代价高昂,这为使用通用文本知识提供了动机。该文提出了基于不确定片段的检索增强命名实体识别框架。该框架识别输入文本中模型不确定程度最高的实体级别文本片段,并基于不确定文本片段从外部知识库中进行检索,从而有效地获得相关的知识文本以消除输入样本的歧义。此外,该文提出知识融合模型,结合检索到的知识文本对不确定的样本进行推理。该文在四个公开基准数据集中进行实验,结果表明,该框架显著提高模型性能,F1值较基准模型平均提高1.21%。
Abstract
Previous work on Chinese named entity recognition focused on lexicon-based methods to determine the boundary, which suffers unsatisfactory word segmentation results and model re-building owing to new words. In this paper, we propose the uncertainty-based retrieval framework for Chinese named entity recognition. The framework identifies entity-level uncertain span from the input text and retrieves them from external knowledge base to disambiguate the input samples. Furthermore, we propose the Knowledge Fusion Model to solve the uncertain samples by combining retrieved knowledge. Experiments on four benchmark datasets demonstrate the effectiveness of the framework by 1.21% improvement in F-score on average.
关键词
命名实体识别 /
检索方法 /
神经网络不确定性
{{custom_keyword}} /
Key words
named entity recognition /
retrieval method /
neural network uncertainty
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] LI J, SUN A, HAN J, et al. A survey on deep learning for named entity recognition[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 34(1): 50-70.
[2] LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 260-270.
[3] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4171-4186.
[4] ZHANG Y, YANG J. Chinese NER using lattice LSTM[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 1554-1564.
[5] LI X, YAN H, QIU X, et al. FLAT: Chinese NER using flat-lattice transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 6836-6842.
[6] LIU W, FU X, ZHANG Y, et al. Lexicon enhanced chinese sequence labeling using BERT adapter[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021: 5847-5858.
[7] WANG B, ZHANG Z, XU K, et al. DyLex: Incorporating dynamic lexicons into BERT for sequence labeling[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2021: 2679-2693.
[8] ROBERTSON S, ZARAGOZA H. The probabilistic relevance framework: BM25 and beyond[J]. Foundations and Trends in Information Retrieval, 2009, 3(4): 333-389.
[9] LUAN Y, EISENSTEIN J, TOUTANOVA K, et al. Sparse, dense, and attentional representations for text retrieval[J]. Transactions of the Association for Computational Linguistics, 2021, 9: 329-345.
[10] WANG X, JIANG Y, BACH N, et al. Improving named entity recognition by external context retrieving and cooperative learning[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021: 1800-1812.
[11] GAL Y, GHAHRAMANI Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning[C]//Proceedings of the International Conference on Machine Learning. PMLR, 2016: 1050-1059.
[12] MA X, HOVY E. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 1064-1074.
[13] CAO P, CHEN Y, LIU K, et al. Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 182-192.
[14] HOULSBY N, GIURGIU A, JASTRZEBSKI S, et al. Parameter-efficient transfer learning for NLP[C]//Proceedings of the International Conference on Machine Learning. PMLR, 2019: 2790-2799.
[15] GUU K, LEE K, TUNG Z, et al. Realm: Retrieval-augmented language model pre-training[C]//Proceedings of the 37th International Conference on Machine Learning,2020:3929-3938.
[16] NAKANO R, HILTON J, BALAJI S, et al. WebGPT: Browser-assisted question-answering with human feedback[J]. arXiv preprint arXiv:2112.09332, 2021.
[17] KHANDELWAL U, FAN A, JURAFSKY D, et al. Nearest neighbor machine translation[C]//Proceedings of IUR,2021:1-14.
[18] LEWIS P, PEREZ E, PIKTUS A, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks[C]//Proceedings of the Advances in Neural Information Processing Systems, 2020, 33: 9459-9474.
[19] XIONG L, XIONG C, LI Y, et al. Approximate nearest neighbor negative contrastive learning for dense text retrieval[C]//Proceedings of ICLR,2021:1-16.
[20] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.
[21] WANG X, JIANG Y, BACH N, et al. Structure-level knowledge distillation for multilingual sequence labeling[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 3317-3330.
[22] ZHU J, WANG H, YAO T, et al. Active learning with sampling by uncertainty and density for word sense disambiguation and text classification[C]//Proceedings of the 22nd International Conference on Computational Linguistics, 2008: 1137-1144.
[23] REN P, XIAO Y, CHANG X, et al. A survey of deep active learning[J]. ACM Computing Surveys, 2021, 54(9): 1-40.
[24] GUI T, YE J, ZHANG Q, et al. Uncertainty-aware label refinement for sequence labeling[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2020: 2316-2326.
[25] BROWN D G, GOLOD D. Decoding HMMs using the k best paths: Algorithms and applications[J]. BMC Bioinformatics, 2010, 11(1): 1-7.
[26] YANG J, ZHANG Y, DONG F. Neural reranking for named entity recognition[C]//Proceedings of the International Conference Recent Advances in Natural Language Processing, 2017: 784-792.
[27] LEVOW G A. The 3rd international Chinese language processing bakeoff: Word segmentation and named entity recognition[C]//Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing, 2006: 108-117.
[28] WEISCHEDEL R, PRADHAN S, RAMSHAW L, et al. Ontonotes release 4.0[DB]. LDC2011T03, Philadelphia, Penn.: Linguistic Data Consortium, 2011.
[29] PENG N, DREDZE M. Named entity recognition for Chinese social media with jointly trained embeddings[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2015: 548-554.
[30] LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[C]//Proceedings of ICLR,2019:1-19.
[31] HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[J]. arXiv preprint arXiv:1508.01991, 2015.
[32] YAN H, DENG B, LI X, et al. TENER: Adapting transformer encoder for named entity recognition[J]. arXiv preprint arXiv:1911.04474, 2019.
[33] SUN Y, WANG S, LI Y, et al. ERNIE: Enhanced representation through knowledge integration[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 1441-1451.
[34] MENGGE X, YU B, LIU T, et al. Porous lattice transformer encoder for Chinese NER[C]//Proceedings of the 28th International Conference on Computational Linguistics, 2020: 3831-3841.
[35] MA R, PENG M, ZHANG Q, et al. Simplify the usage of lexicon in Chinese NER[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 5951-5960.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
科技创新2030-“新一代人工智能”重大项目(2020AAA0108702);国家自然科学基金(62022027)
{{custom_fund}}