Information Extraction and Text Mining
LI Ren, LI Tong, YANG Jianxi, MO Tianjin, JIANG Shixin, LI Dong
2021, 35(4): 83-91.
The information extraction for bridge inspection reports is a less addressed issue, which contain a large amount of key business information such as structural component parameters and inspection description. Clarifying the task of named entity recognition in this field, this paper also reveals the characteristics of the entities to be identified, such as location name or route name nesting, character ambiguity, context location correlation and direction sensitivity. A bridge inspection named entity recognition approach is then proposed based on Transformer-BiLSTM-CRF. First, the Transformer encoder is used to model the long-distance position-dependent features of text sequences, and the BiLSTM network is adopted to further capture the direction-sensitive features. Finally, the labeled sequence prediction is implemented via the CRF model. The experimental results show that, compared with the mainstream named entity recognition models, the proposed model achieves better performance.