基于Transformer-BiLSTM-CRF的桥梁检测领域命名实体识别

李韧,李童,杨建喜,莫天金,蒋仕新,李东

PDF(1499 KB)
PDF(1499 KB)
中文信息学报 ›› 2021, Vol. 35 ›› Issue (4) : 83-91.
信息抽取与文本挖掘

基于Transformer-BiLSTM-CRF的桥梁检测领域命名实体识别

  • 李韧,李童,杨建喜,莫天金,蒋仕新,李东
作者信息 +

Bridge Inspection Named Entity Recognition Based on Transformer-BiLSTM-CRF

  • LI Ren, LI Tong, YANG Jianxi, MO Tianjin, JIANG Shixin, LI Dong
Author information +
History +

摘要

作为我国桥梁工程领域最重要的数据源之一,桥梁检测文本蕴含了丰富的结构构件参数及检测病害描述等关键业务信息,但面向该领域的文本信息抽取研究尚未有效开展。该文在阐明其领域命名实体识别目标任务的基础上,分析了待识别实体在蕴含大量专业术语的同时,存在地名或路线名嵌套、字符多义、上下文位置相关和方向敏感等领域特性。鉴于此,该文提出一种基于Transformer-BiLSTM-CRF的桥梁检测领域命名实体识别方法。首先,利用Transformer编码器对检测文本字符序列的上下文长距离位置依赖特征进行建模,并采用BiLSTM网络进一步捕获方向敏感性特征,最终在CRF模型中实现标注序列预测。实验结果表明,相较于当前主流的命名实体识别模型,该文提出的方法具有更好的综合识别效果。

Abstract

The information extraction for bridge inspection reports is a less addressed issue, which contain a large amount of key business information such as structural component parameters and inspection description. Clarifying the task of named entity recognition in this field, this paper also reveals the characteristics of the entities to be identified, such as location name or route name nesting, character ambiguity, context location correlation and direction sensitivity. A bridge inspection named entity recognition approach is then proposed based on Transformer-BiLSTM-CRF. First, the Transformer encoder is used to model the long-distance position-dependent features of text sequences, and the BiLSTM network is adopted to further capture the direction-sensitive features. Finally, the labeled sequence prediction is implemented via the CRF model. The experimental results show that, compared with the mainstream named entity recognition models, the proposed model achieves better performance.

关键词

命名实体识别 / 桥梁检测 / Transformer

Key words

named entity recognition / bridge inspection / Transformer

引用本文

导出引用
李韧,李童,杨建喜,莫天金,蒋仕新,李东. 基于Transformer-BiLSTM-CRF的桥梁检测领域命名实体识别. 中文信息学报. 2021, 35(4): 83-91
LI Ren, LI Tong, YANG Jianxi, MO Tianjin, JIANG Shixin, LI Dong. Bridge Inspection Named Entity Recognition Based on Transformer-BiLSTM-CRF. Journal of Chinese Information Processing. 2021, 35(4): 83-91

参考文献

[1] 刘浏, 王东波. 命名实体识别研究综述[J]. 情报学报, 2018, 37(3):329-340.
[2] 杨锦锋, 关毅, 何彬,等. 中文电子病历命名实体和实体关系语料库构建[J]. 软件学报, 2016, 27(11):2725- 2746.
[3] 王学锋, 杨若鹏, 朱巍. 基于深度学习的军事命名实体识别方法[J]. 装甲兵工程学院学报, 2018, 32(4):94- 98.
[4] 侯梦薇,卫荣,陆亮,等.知识图谱研究综述及其在医疗领域的应用[J].计算机研究与发展,2018,55(12):2587 -2599.
[5] 贺栓海, 赵祥模, 马建, 等.公路桥梁检测及评价技术综述[J].中国公路学报,2017,30(11):63-80.
[6] 鲍跃全, 李惠. 人工智能时代的土木工程[J].土木工程学报, 2019, 52(5):1-11.
[7] Liu K, El-Gohary N. Ontology-based semi-supervised conditional random fields for automated information extraction from bridge inspection reports[J]. Automation in Construction, 2017, 81:313-327.
[8] 奚雪峰, 周国栋.面向自然语言处理的深度学习研究[J]. 自动化学报, 2016, 42(10): 1445-1465.
[9] Huang Z, Xu W, Yu K.Bidirectional LSTM-CRF models for sequence tagging[J]. arXiv preprint:1508.01991, 2015.
[10] Collobert R, Weston J, Bottou L, et al. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12(1):2493-2537.
[11] 李丽双, 郭元凯. 基于CNN-BLSTM-CRF模型的生物医学命名实体识别[J]. 中文信息学报, 2018, 32(1): 116-122.
[12] Dong C, Zhang J,Zong C, et al. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition[C]//Proceedings of International Conference on Computer Processing of Oriental Language. Springer International Publishing, 2016: 239-250.
[13] 张海楠, 伍大勇, 刘悦, 等. 基于深度神经网络的中文命名实体识别[J]. 中文信息学报, 2017, 31(4): 28-35.
[14] 盛剑, 向政鹏, 秦兵, 等. 多场景文本的细粒度命名实体识别[J]. 中文信息学报, 2019,33(6): 80-87.
[15] 禤镇宇, 蒋盛益, 张礼明, 等. 基于多特征Bi-LSTM- CRF的影评人名识别研究[J].中文信息学报, 2019,33 (3):94-101.
[16] Zhang Y, Yang J. Chinese NER using lattice LSTM[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018:1554- 1564.
[17] 朱张莉, 饶元, 吴渊, 等. 注意力机制在深度学习中的研究进展[J]. 中文信息学报, 2019, 33(6):1-11.
[18] 杨培, 杨志豪, 罗凌, 等. 基于注意机制的化学药物命名实体识别[J]. 计算机研究与发展, 2018, 55(07):194- 202.
[19] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. arXiv preprint arXiv: 1706.03762, 2017.
[20] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding [J]. arXiv preprint arXiv: 1810.04805, 2018.
[21] Sehanobish A, Song C H. Using Chinese Glyphs for named entity recognition[J]. arXiv preprint arXiv: 1909. 09922, 2019.

基金

国家自然科学基金(51608070);重庆市教委科学技术研究项目(KJQN201800705,KJQN201900726);重庆交通大学国家自然科学基金(2018PY34)
PDF(1499 KB)

2265

Accesses

0

Citation

Detail

段落导航
相关文章

/