基于音节-形态素融合的朝鲜语命名实体识别研究

高君龙,崔荣一,赵亚慧

PDF(1845 KB)
PDF(1845 KB)
中文信息学报 ›› 2023, Vol. 37 ›› Issue (4) : 28-33.
民族、跨境及周边语言信息处理

基于音节-形态素融合的朝鲜语命名实体识别研究

  • 高君龙,崔荣一,赵亚慧
作者信息 +

Korean Named Entity Recognition Based on Syllable-Morpheme Fusion

  • GAO Junlong, CUI Rongyi, ZHAO Yahui
Author information +
History +

摘要

命名实体识别任务是朝鲜语自然语言处理研究过程中最重要的基础任务之一。针对朝鲜语命名实体识别的边界划定不明确和准确率低等问题,该文提出基于Transformer的音节-形态素融合的朝鲜语命名实体识别模型。首先通过BERT预训练模型分别对音节和形态素进行词嵌入;其次使用两种不同的向量融合方法将音节向量和形态素向量相融合,即简单的向量拼接方法和考虑到向量联系与差异的启发式融合方法;最后将融合后的向量作为模型的输入完成命名实体识别任务。实验结果在KLUE公布的朝鲜语命名实体识别数据集中F1值达到了88.78%,相比单一粒度实验提高约3至4个百分点。

Abstract

The named entity recognition (NER) task is one of the most fundamental tasks in Korean natural language processing. In order to deal with the problems of unclear boundary delimitation and low accuracy rate of Korean NER, this paper proposes a syllable-morpheme fusion Korean named entity recognition model based on Transformer. Firstly, the word embedding is acquired for syllables and morphemes by BERT. Then, the simple vector concatenating method and a heuristic fusion method that takes into consideration the connection and difference between the two vectors are both described. Finally, the fused vectors are input into the model to complete NER task. Experimental results show that the F1-score in the Korean NER dataset published by KLUE reaches 88.78%, which is about 3~4% higher than the single granularity experiment.

关键词

朝鲜语 / 命名实体识别 / 音节-形态素融合 / 预训练

Key words

Korean / named entity recognition / syllable-morpheme fusion / pre-training

引用本文

导出引用
高君龙,崔荣一,赵亚慧. 基于音节-形态素融合的朝鲜语命名实体识别研究. 中文信息学报. 2023, 37(4): 28-33
GAO Junlong, CUI Rongyi, ZHAO Yahui. Korean Named Entity Recognition Based on Syllable-Morpheme Fusion. Journal of Chinese Information Processing. 2023, 37(4): 28-33

参考文献

[1] 金永寿. 中国朝鲜语规范原则与规范细则研究 [M]. 北京: 人民出版社,2012.
[2] 金永寿. 汉朝翻译理论研究现状与今后的研究方向 [J]. 中国朝鲜语文, 2020(6): 66-73.
[3] LI J, SUN A, HAN J, et al. A survey on deep learning for named entity recognition[J]. IEEE Transactions on Knowledge and Data Engineering, 2020,34(1): 50-70.
[4] 何玉洁,杜方,史英杰,等. 基于深度学习的命名实体识别研究综述[J]. 计算机工程与应用, 2021,57(11): 21-36.
[5] HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[J].arXiv:1508.01991, 2015.
[6] MA X, HOVY E. End-to-end sequence labeling via bidirectional LSTM-CNNS-CRF[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,2016: 1064-1074.
[7] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
[8] YAN H, DENG B, LI X, et al. TENER: Adapting transformer encoder for named entity recognition[J]. arXiv preprint arXiv:1911.04474, 2019.
[9] DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the NAACL,2019: 4171-4186.
[10] 杨飘,董文永. 基于BERT嵌入的中文命名实体识别方法[J]. 计算机工程,2020,46(4):40-45.
[11] 赵丹丹,黄德根,孟佳娜等. 多头注意力与字词融合的中文命名实体识别[J]. 计算机工程与应用,2022,58(07): 142-149.
[12] 殷章志,李欣子,黄德根,等. 融合字词模型的中文命名实体识别研究[J]. 中文信息学报,2019,33(11): 95-100.
[13] ZHANG Y, YANG J. Chinese NER using lattice LSTM[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,2018: 1554-1564.
[14] PARK S, MOON J, KIM S, et al. KLUE: Korean language understanding evaluation[J]. arXiv preprint arXiv:2105. 09680, 2021.

基金

国家语委“十三五”科研规划项目(YB135-76);延边大学外国语言文学世界一流学科建设科研项目(18YLPY13,18YLPY14)
PDF(1845 KB)

529

Accesses

0

Citation

Detail

段落导航
相关文章

/