|
|
Chinese Nested Named Entity Recognition Corpus Construction |
LI Yanqun1,2, HE Yunqi1,2, QIAN Longhua1,2, ZHOU Guodong1,2 |
1.Natural Language Processing Laboratory, Soochow University, Suzhou, Jiangsu 215006, China;
2.School of Computer Science and Technology , Soochow University, Suzhou, Jiangsu 215006, China |
|
|
Abstract Nested named entities contain rich entities and semantic relations between them, which facilitates to improve the effectiveness of information extraction. Due to the lack of uniform and standard Chinese nested named entity corpora, currently it is difficult to compare the research works on Chinese nested named entities. Based on the existing named entity corpora, this paper proposes to use semi-automatic method to construct two Chinese nested named entity corpora. First, we use the annotation information in the Chinese named entity corpora to automatically construct as many nested named entities as possible, and then manually adjust them to meet our annotation requirements for Chinese nested entity in order to build high-quality Chinese nested named entity corpora. The preliminary experiment of nested named entity recognition both within and across the corpora shows that Chinese nested named entity recognition is still a quite difficult problem and requires further research.
|
Received: 19 October 2017
|
|
|
|
|
[1] Sirsat S R, Chavan D V, Deshpande D S P. Mining knowledge from text repositories using information extraction: A review[J]. Sadhana, 2014, 39(1): 53-62.
[2] Zheng S,Hao Y, Lu D, et al. Joint entity and relation extraction based on a hybrid neural network[J].Neurocomputing, 2017,257(000): 1-8.
[3] Lample G, Ballesteros M, Subramanian S, et al. Neural architectures for named entity recognition[C]//Proceedings of NAACL-HLT,2016: 260- 270.
[4] Xu M, Jiang H. A FOFE-based local detection approach for named entity recognition and mention detection[C]//Proceedings of Association for Computational Linguistics, 2017: 1237-1247.
[5] Ni J,Dinu G, Florian R. Weakly supervised cross-lingual named entity recognition via effective annotation and representation projection[C]//Proceedings of Association for Computational Linguistics,2017: 1470-1480.
[6] Ohta T, Tateisi Y, Kim J D. The GENIA corpus: An annotated research abstract corpus in molecular biology domain[C]//Proceedings of International Conference on Human Language Technology Research, 2002: 82-86.
[7] AlexB,Haddow B,Grover C. Recognising nested named entities in biomedical text[C]//Proceedings of Biological, Translational, and Clinical Language Processing,2007: 65-72.
[8] Byrne K. Nested named entity recognition in historical archive text[C]//Proceedings of International Conference on Semantic Computing. IEEE, 2007: 589-596.
[9] 周俊生,戴新宇,尹存燕,等.基于层叠条件随机场模型的中文机构名自动识别[J].电子学报,2006,34(5): 804-808.
[10] 尹迪,周俊生,曲维光.基于联合模型的中文嵌套命名实体识别[J].南京师大学报(自然科学版),2014,37(3): 29-35.
[11] 付春元.汉语嵌套命名实体识别方法研究[D].哈尔滨: 黑龙江大学硕士学位论文,2011.
[12] 刘非凡,赵军,徐波.实体提及的多层嵌套识别方法研究[J].中文信息学报,2007,21(2): 14-21.
[13] Zhou G D, Zhang J, Su J, et al. Recognizing names in biomedical texts: A machine learning approach[J]. Bioinformatics, 2004, 20(7): 1178-1190.
[14] Zhou G D. Recognizing names in biomedical texts using mutual information independence model and SVM plus sigmoid[J]. International Journal of Medical Informatics, 2006, 75(6): 456-467.
[15] Jenny Rose Finkel, Christopher D. Manning. Nested named entity recognition [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Singapore: ACL, 2009: 141-150.
[16] 郭慧志, 刘华, 谢学敏,等. 《人民日报》标注语料的初步统计分析[C].全国计算语言学联合学术会议, 2005.
[17] 万如.中文机构名识别的研究[D]. 大连: 大连理工大学硕士学位论文, 2008.
[18] 黄鑫,朱巧明,钱龙华,等.基于特征组合的中文实体关系抽取[J].微电子学与计算机,2010, 27(4): 198- 200.
[19] 徐薇,付滨,刘柳,等.中文命名实体识别系统的领域扩展[C].全国计算语言学学术会议,2007.
[20] 刘章勋.中文命名实体识别粒度和特征选择研究[D]. 哈尔滨: 哈尔滨工业大学硕士学位论文, 2010. |
|
|
|