命名实体识别作为自然语言处理领域的基础问题一直备受关注。中文命名实体特别是组合命名实体,由于其内部结构复杂,故长度可以很长,已有的研究还存在以下两个问题: 一是字和词之间的关联信息未能充分挖掘,无法将组合实体和简单实体做统一处理;二是组合实体加大了实体序列长短的差异,文本自身信息的捕获不充分。针对问题一,该文借助融合了双向注意力机制的高速网络来充分挖掘字与词之间的关联关系,通过抽取词内部多种有效的字的组合来丰富词的表征;针对问题二,通过自注意力机制从多层次、多视角捕获文本信息,并借助高速网络进行信息的有效桥接。在OntoNotes V 4.0公开语料上的实验结果表明了该文方案的有效性,在不使用大型预训练语言模型的情况下,该文提出的基于两段高速网络的模型取得了目前最好的性能。
Abstract
Named Entity Recognition (NER) has been receiving much attention as a basic work in the field of Natural Language Processing. Due to the complexity of the structure of Chinese words, especially for the combined entities, there are still two issues should be addressed: the associated information between words and characters, and the varied sequence length. To deal with the first issue, a highway network that integrates bi-direction attention is proposed to extract the combination of effective characters in a word for a more adequate representation. For the second issue, this paper provides a highway network combined with self-attention, to obtain relevant features of context from multiple perspectives and levels. Experiments on the OntoNotes V4.0 corpus show the effectiveness of the proposed models: the best performance without using a large pre-trained language model.
关键词
命名实体识别 /
注意力机制 /
高速网络
{{custom_keyword}} /
Key words
named entity recognition /
attention mechanism /
highway networks
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Y Zelenko D, Aone C, Richardella A. Kernel methods for relation extraction[J]. Journal of Machine Learning Research, 2003, 3(02): 1083-1106.
[2] Ge N, Hale J, Charniak E. A statistical approach to anaphora resolution[C]//Proceedings of the 6th Workshop on Very Large Corpora, 1998.
[3] Shang L, Lu Z, Li H. Neural responding machine for short-text conversation[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 2015: 1577-1586.
[4] He J, Wang H. Chinese named entity recognition and word segmentation based on character[C]//Proceedings of the 6th SIGHAN Workshop on Chinese Language Processing, 2008.
[5] Liu Z, Zhu C, Zhao T. Chinese named entity recognition with a sequence labeling approach: based on characters, or based on words?[C]//Proceedings of the International Conference on Intelligent Computing. Springer, Berlin, Heidelberg, 2010: 634-640.
[6] Li H, Hagiwara M, Li Q, et al. Comparison of the impact of word segmentation on name tagging for Chinese and Japanese[C]//Proceedings of the LREC, 2014: 2532-2536.
[7] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[8] Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning MA, USA: Morgan Kaufmann, 2001: 282-289.
[9] Zhang Y, Yang J. Chinese NER using lattice LSTM[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 1554-1564.
[10] Mengge X, Bowen Y, Tingwen L, et al. Porous lattice-based transformer encoder for Chinese NER[J]. arXiv preprint arXiv: 1911.02733, 2019.
[11] 李明扬, 孔芳. 融入自注意力机制的社交媒体命名实体识别[J]. 清华大学学报(自然科学版), 2019, 59(6): 461-467.
[12] Jie Z, Lu W. Dependency-guided LSTM-CRF for named entity recognition[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 3853-3863.
[13] Guo Q, Qiu X, Liu P, et al. Star-transformer[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 1315-1325.
[14] Lample G, Ballesteros M, Subramanian S, et al. Neural architectures for named entity recognition[C]//Proceedings of NAACL-HLT, 2016: 260-270.
[15] Li S, Zhao Z, Hu R, et al. Analogical reasoning on Chinese morphological and semantic relations[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 138-143.
[16] 冯辉. 视觉注意力机制及其应用研究[D]. 北京: 华北电力大学硕士学位论文, 2011.
[17] Srivastava R K, Greff K, Schmidhuber J. Highway networks[J]. arXiv preprint arXiv: 1505.00387, 2015.
[18] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference or Neural Information Processing Systems, 2017: 5998-6008.
[19] Lin Z, Feng M, Santos C, et al. A structured self-attentive sentence embedding[A]//Proceedings of 5th International Conference on Learning Representations. Toulon, France: OpenReview.net, April 24-26, 2017.
[20] Weischedel R, Pradhan S, Ramshaw L, et al. OntoNotes Release 4.0, with OntoNotes DB Tool v[R]. 0.999 beta. Technical report, Raytheon BBN Technologies et al, 2010.
[21] De Marneffe M C, Dozat T, Silveira N, et al. Universal Stanford dependencies: a cross-linguistic typology[C]//Proceedings of the LREC, 2014, 14: 4585-4592.
[22] Newey W K. Adaptive estimation of regression models via moment restrictions[J]. Journal of Econometrics, 1988, 38(3): 301-339.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61876118,61836007)
{{custom_fund}}