林广和,张绍武,林鸿飞. 基于细粒度词表示的命名实体识别研究[J]. 中文信息学报, 2018, 32(11): 62-71,78.
LIN Guanghe, ZHANG Shaowu, LIN Hongfei. Named Entity Identification Based on Fine-Grained Word Representation. , 2018, 32(11): 62-71,78.
Named Entity Identification Based on Fine-Grained Word Representation
LIN Guanghe1, ZHANG Shaowu1,2, LIN Hongfei1
1.School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China;
2.School of Computer Science and Engineering, Xinjiang University of Finance and Economics, Urumqi, Xinjiang 830012, China
Abstract:Named entity recognition(NER), whose performance has a highly marked impact on the following piped nature language processing(NLP) system such as relation extraction and semantic role labeling, is a fundamental stage in NLP. Traditional statistical models have difficulty in feature designing, whose features have poor cross-domain adaptability, and some neural network models neglect morphological information of the word.Aiming at the above problems, our paper proposes a new end-to-end neural network model(Finger-BiLSTM-CRF) based on a fine-grained word representation for named entity recognition task. First, we design Finger, a character-level word representation model based on the attention mechanism, for the integration of morphological information with information from each character of current token. Secondly, we combine Finger with BiLSTM-CRF for named entity recognition task. Finally, the model trained in an end-to-end fashion achieves a F1 score of 91.09% on test dataset for CoNLL 2003. The experimental results show that our Finger model significantly boosts the recall of the NER system , which results in performance improvement of recognition ability of the system.
[1] Sang E F T K,De Meulder F.Introduction to the CoNLL-2003 shared task:Language Independent named entity recognition[J].IEEE Transactions on Wireless Communications,2003,21(08):142-147.
[2] Ratinov L,Roth D.CoNLL’09 design challenges and misconceptions in named entity recognition[C]//Proceedings of the CoNLL’09:Proceedings of the Thirteenth Conference on Computational Natural Language Learning,2009:147-155.
[3] Lin D,Wu X.Phrase clustering for discriminative learning [C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP.2009:1030-1038.
[4] Passos A,Kumar V,McCallum A.Lexicon infused phrase embeddings for named entity resolution [C]//Proceedings of the Eighteenth Conference on Computational Natural Language Learning,2014:78-86.
[5] Luo G,et al.Joint named entity recognition and disambiguation[C]//Proceedings of Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing,2015:1030-1038.
[6] 杨娅,等.MBNER:面向生物医学领域的多种实体识别系统[J].中文信息学报,2016,30(1):170-176.
[7] Collobert R,et al.Natural language processing (almost) from scratch[J].Journal of Machine Learning Research,2011,12(1):2493-2537.
[8] Huang Z,Xu W,Yu K.Bidirectional LSTM-CRF models for sequence tagging[J].arXiv preprint arXiv:1508.01991,2015.
[9] Lample G,et al.Neural architectures for named entity recognition[C]//Proceedings of NAACL-HLT,2016:260-270.
[10] Chiu J P,Nichols E.Named entity recognition with bidirectional LSTM-CNNs[J].Transactions of the Association for Computational Linguistics,2015,4(10):357-370.
[11] Bahdanau D,et al.Neural machine translation by jointly learning to align and translate[C]//Proceedings of International Conference on Learning Representations,2015.
[12] Luong T,et al.Effective approaches to attention-based neural machine translation[C]//Proceedings of Empirical Methods in Natural Language Processing,2015:1412-1421.
[13] Rei M,et al.Attending to characters in neural sequence labeling models[C] //Proceedings of International Conference on Computational Linguistics,2016:309-318.
[14] Bharadwaj A,et al.Phono-logically aware neural model for named entity recognition in low resource transfer settings[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing,2016:1462-1472.
[15] Vaswani A,et al.Attention is all you need [C]//Proceedings of Neural Information Processing Systems,2017:6000-6010.
[16] Bengio Y,Simard P,Frasconi P.Learning long-term dependencies with gradient descent is difficult[J].IEEE Transactions on Neural Networks,1994,5(2):157-166.
[17] Hochreiter S,Schmidhuber J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[18] Graves A,Schmidhuber J.Framewise phoneme classification with bidirectional LSTM networks[C]//Proceedings of the International Symposium on Neural Networks,2005:2047-2052
[19] Viterbi A.Error bounds for convolutional codes and an asymptotically optimum decoding algorithm[J].IEEE Transactions on Information Theory,1967,13(2):260-269.
[20] Bergstra J,et al.Theano:A CPU and GPU math compiler in Python [C]//Proceedings of the 9th Python in Science Conference.2010:3-10.[21] Pascanu R,Mikolov T,Bengio Y.On the difficulty of training recurrent neural networks[C]//Proceedings of International Conference on Machine Learning,2013:1310-1318.
[22] He K,et al.Delving deep into rectifiers:Surpassing human-level performance on imagenet classification[C]//Proceedings of the IEEE International Conference on Computer Vision,2015:1026-1034.
[23] Glorot X,Bengio Y.Understanding the difficulty of training deep feedforward neural networks[J].Journal of Machine Learning Research,2010(9):249-256.
[24] Jozefowicz R,et al.An empirical exploration of recurrent network architectures[C]//Proceedings of International Conference on Machine Learning,2015:2342-2350.