结合级联技术的藏文预训练命名实体识别模型

徐泽辉,珠杰,许泽洲,汪超,严松思,刘亚姗

PDF(2333 KB)
PDF(2333 KB)
中文信息学报 ›› 2023, Vol. 37 ›› Issue (11) : 23-28.
民族、跨境及周边语言信息处理

结合级联技术的藏文预训练命名实体识别模型

  • 徐泽辉1,2,珠杰1,2,许泽洲1,2,汪超1,2,严松思1,2,刘亚姗1,2
作者信息 +

Cascaded Tibetan Named Entity Recognition Model with Pre-trained Language Model

  • XU Zehui1,2, ZHU Jie1,2, XU Zezhou1,2, WANG Chao1,2, YAN Songsi1,2, LIU Yashan1,2
Author information +
History +

摘要

命名实体识别是藏文自然语言处理中的一项关键任务,该文提出了结合三种藏文预训练模型(Word2Vec、ELMo、ALBERT)的Casade-BiLSTM-CRF结构。级联技术(Cascade)将藏文命名实体识别划分为两个子任务(实体边界划分,实体类别判断)分阶段进行,简化了模型结构;使用藏文预训练模型,能更好地学习藏文先验知识。实验表明,Cascade-BiLSTM-CRF模型相比于BiLSTM-CRF模型训练一轮时间缩短了28.30%;而将级联技术与预训练技术相结合,在取得更好识别效果的同时还缩短了模型训练时间。

Abstract

Named entity recognition is a key task in Tibetan processing. This paper proposes a Casaded BiLSTM-CRF method combining three Tibetan pre-training models (Word2Vec, ELMo, ALBERT). The cascade Tibetan named entity recognition refers to treat this task by two sub-tasks, i.e. entity boundary delineation and entity class determination. Experiments show that the proposed model decreases the training time by 28.30% compared with the BiLSTM-CRF model, and combining the pre-training technique achieves better recognition results.

关键词

藏文命名实体识别 / 级联 / 预训练

Key words

Tibetan NER / cascade / pre-training

引用本文

导出引用
徐泽辉,珠杰,许泽洲,汪超,严松思,刘亚姗. 结合级联技术的藏文预训练命名实体识别模型. 中文信息学报. 2023, 37(11): 23-28
XU Zehui, ZHU Jie, XU Zezhou, WANG Chao, YAN Songsi, LIU Yashan. Cascaded Tibetan Named Entity Recognition Model with Pre-trained Language Model. Journal of Chinese Information Processing. 2023, 37(11): 23-28

参考文献

[1] HUANG Z, XU W. Bidirectional LSTM-CRF models for sequence tagging[J]. CoRR,2015,abs/ 1508.01991.
[2] MIKOLOV T, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. CoRR,2013,abs/1301.3781.
[3] PETERS M E, et al. Deep contextualized word representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 2227-2237.
[4] LAN Z, CHEN M, GOODMAN S, et al. Albert: A lite bert for self-supervised learning of language representations[C]//Proceedings of the International Conference on Learning Representations, 2020: 1-17.
[5] YU H Z, JIANG J T, MA N. Named entity recognition for tibetan texts using case-auxiliary grammars[C]//Proceedings of International Muliti Conference of Engineers and Computer Scientists.2010:601-604.
[6] SUN Y, YAN X, ZHAO X, et al. Research on automatic recognition of Tibetan personal names based on multi-features[C]//Proceedings of International Conference on Natural Language Processing and Knowledge Engineering, 2010, 1-5.
[7] 华却才让,姜文斌,赵海兴,等. 基于感知机模型藏文命名实体识别[J]. 计算机工程与应用,2014,50(15):172-176.
[8] 珠杰,李天瑞,刘胜久. 基于条件随机场的藏文人名识别技术研究[J]. 南京大学学报(自然科学),2016,52(02):289-299.
[9] 刘飞飞,王志娟. 基于层次特征的藏文人名识别研究[J]. 计算机应用研究,2018,35(09):2583-2587.
[10] 珠杰,李天瑞. 深度学习模型的藏文人名识别方法[J].高原科学研究,2017,1(01):112-124.
[11] 王志娟,刘飞飞,赵小兵,等. 基于置信度的藏文人名识别的主动学习模型研究[J]. 中文信息学报,2019,33(08):53-59.
[12] 孙朋. 基于弱监督学习的藏文命名实体识别研究[D].北京: 中央民族大学硕士学位论文,2020.
[13] 李晓敏. 基于深度学习的藏文命名实体识别研究[D].兰州: 兰州大学硕士学位论文,2021.
[14] 环科尤. 基于深度学习的格萨尔史诗命名实体识别关键技术研究[D].西宁: 青海师范大学硕士学位论文,2022.
[15] 洛桑嘎登,群诺,索南尖措,等. 融合音节部件特征的藏文命名实体识别方法[J]. 厦门大学学报(自然科学版),2022,61(04):624-629.
[16] GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural Networks. 2005, 18(5-6):602-610.
[17] WEI Z, SU J, WANG Y, et al. A novel cascade binary tagging framework for relational triple extraction[J]. arXiv preprint arXiv:1909.03227, 2019.
[18] WANG. Named entity recognition practice and exploration[OL]. https://github.com/wavewangyue/ner,2020.
[19] DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4171-4186.

基金

西藏大学提升计划项目(ZDTSJH21-07);国家自然科学基金(62066042);教育部人文社会科学研究项目(21YJCZH059);西藏大学培育计划项目(ZDCZJH21-10);2021年西藏自治区高校人文社会科学研究项目(SK2021-24);西藏大学珠峰学科建设计划项目(zf22002001)
PDF(2333 KB)

Accesses

Citation

Detail

段落导航
相关文章

/