命名实体识别、排歧和跨语言关联

赵军

PDF(1339 KB)
PDF(1339 KB)
中文信息学报 ›› 2009, Vol. 23 ›› Issue (2) : 3-17.
综述

命名实体识别、排歧和跨语言关联

  • 赵军
作者信息 +

Survey on Named Entity Recognition, Disambiguation and Cross-Lingual Coreference Resolution

  • ZHAO Jun
Author information +
History +

摘要

命名实体是文本中承载信息的重要语言单位,命名实体的识别和分析在网络信息抽取、网络内容管理和知识工程等领域都占有非常重要的地位。有关命名实体的研究任务包括实体识别、实体排歧、实体跨语言关联、实体属性抽取、实体关系检测等,该文重点介绍命名实体识别、排歧和跨语言关联等任务的研究现状,包括难点、评测、现有方法和技术水平,并对下一步需要重点解决的问题进行分析和讨论。该文认为,命名实体识别、排歧和跨语言关联目前的技术水平还远远不能满足大规模真实应用的需求,需要更加深入的研究。在研究方法上,要突破自然语言文本的限制,直接面向海量、冗余、异构、不规范、含有大量噪声的网页信息处理。

Abstract

Named Entities are important meaningful units in texts. The recognition and analysis of named entities is of great significance in the field of Web information extraction, Web content management and knowledge engineering, etc. The research on named entities includes named entity recognition, disambiguation, coreference resolution, attribute extraction and relation detection, etc. Focusing on named entity recognition, disambiguation and cross-lingual coreference resolution, the paper gives a thorough survey on the state of the art of these tasks, including the challenges, methods, evaluations, performances and the problems to be solved. The paper suggests that, the performances of the current systems of named entity recognition, disambiguation and cross-lingual coreference resolution are far from the requirement of large-scale practical applications. In the view of methods and approaches, named entity recognition, disambiguation and cross-lingual conference resolution should be carried beyond the natural language texts and should be investigated directly among the large-scale, redundant, heterogeneous, ill-formed and noisy web pages.

关键词

计算机应用 / 中文信息处理 / 命名实体识别 / 命名实体排歧 / 命名实体跨语言关联

Key words

computer application / Chinese information processing / named entity recognition / named entity disambiguation / named entity cross-lingual coreference resolution

引用本文

导出引用
赵军. 命名实体识别、排歧和跨语言关联. 中文信息学报. 2009, 23(2): 3-17
ZHAO Jun. Survey on Named Entity Recognition, Disambiguation and Cross-Lingual Coreference Resolution. Journal of Chinese Information Processing. 2009, 23(2): 3-17

参考文献

[1] NIST. The ACE 2007 (ACE07) Evaluation Plan: Evaluation of the Detection and Recognition of ACE Entities, Values, Temporal Expressions, Relations, and Events [EB/OL]. [2007].http://www.nist.gov/speech/tests/ace/2007/doc/ace07-evalplan.v1.3a.pdf.
[2] Nancy A. Chinchor. Overview of MUC-7/MET-2 [C]//Proceedings of the Seventh Message Understanding Conference (MUC-7), Fairfax, Virginia, 1998.
[3] Gina-Anne Levow. The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition [C]//Proceedings of the Fifth SigHAN Workshop on Chinese Language Processing, Sydney: Association for Computational Linguistics, 2006:108-117.
[4] A. Mikheev, C. Grover, Moens M. Description of the LTG System Used for MUC-7 [C]//Proceedings of 7th Message Understanding Conference (MUC-7), Fairfax, Virginia, 1998.
[5] 863计划中文信息处理与智能人机接口技术评测组. 2004年度863计划中文信息处理与智能人机交互技术评测: 命名实体评测结果报告 [R]. 北京: 863计划中文信息处理与智能人机接口技术评测组,2004.
[6] Ralph Grishman, Beth Sundheim. Design of the MUC-6 evaluation [C]//Proceedings of 6th Message Understanding Conference, Columbia, MD, 1995.
[7] G. R. Krupka, K. Hausman. IsoQuest. Inc.: Description of the NetOwl TM Extractor System as Used for MUC-7 [C]//Proceedings of the 7th Message Understanding Conference. (MUC-7), Fairfax, Virginia, 1998.
[8] W.J. Black, F. Rinaldi, D. Mowart. FACILE: Description of the NE System Used for MUC-7 [C]//Proceedings of the 7th Message Understanding Conference. (MUC-7), Fairfax, Virginia, 1998.
[9] Youzheng Wu, Jun Zhao, Bo Xu, et al. Chinese Named Entity Recognition Model Based on Multiple Features [C]//Proceedings of Human Language Technology Conference & Conference on Empirical Methods in NLP (HLT/EMNLP), Vancouver, B. C., Canada: Association for Computational Linguistics, 2005: 427-434.
[10] Youzheng Wu, Jun Zhao, Bo Xu. Chinese Named Entity Recognition Combining Statistical Model with Human Knowledge [C]//Proceedings of the Workshop attached with 41st ACL for Multilingual and Mix-language Named Entity Recognition: Combining Statistical and Symbolic Models, Sappora, Japan: Association for Computational Linguistics, 2003: 65-72.
[11] Daniel M. Bikel, Scott Miller, Richard Schwartz, et al. Nymble: a High-Performance Learning Name-finder [C]//Proceedings of Fifth Conference on Applied Natural Language Processing, New York, NY: Association for Computational Linguistics, 1997: 194-201.
[12] Jian Sun, Jianfeng Gao, Lei Zhang, et al. Chinese Named Entity Identification Using Class-based Language Model [C]//Proceedings of the 19th international conference on Computational linguistics(COLING 2002), Taipei: Association for Computational Linguistics, 2002: 1-7.
[13] Huaping Zhang, Qun Liu, Hongkui Yu, et al. Chinese Named Entity Recognition Using Role Model [J]. Special issue "Word Formation and Chinese Language processing" of the International Journal of Computational Linguistics and Chinese Language Processing, 2003, 8(2): 29-60.
[14] A. Borthwick. A Maximum Entropy Approach to Named Entity Recognition [D]. New York: New York University. 1999.
[15] J. Aberdeen, J. Burger, D. Day, et al. MITRE: Description of the Alembic system used for MUC-6 [C]//Proceedings of the 6th Message Understanding Conference (MUC-6), Columbia, Maryland, Association for Computational Linguistics, 1995: 141-155.
[16] S. Sekine, R. Grishman, H. Shinou. A decision tree method for finding and classifying names in Japanese texts [C]//Proceedings of the Sixth Workshop on Very Large Corpora, Canada:Association for Computational Linguistics, 1998: 171-178.
[17] Michael Collins, Yoram Singer. Unsupervised models for named entity classification [C]//Proceedings of 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, University of Maryland, USA: Association for Computational Linguistics, 1999: 100-110
[18] Michael Collins. Ranking Algorithms for Named Entity Extraction: Boosting and the Voted Perceptron [C]//Proceedings of 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia: Association for Computational Linguistics, 2002: 489-496.
[19] 孙茂松, 张维杰. 英语姓名译名的自动识别 [C]//陈力为. 计算语言学研究与应用. 北京: 北京语言学院出版社,1993: 144-149.
[20] 孙茂松, 黄昌宁, 高海燕,等. 中文姓名的自动辨识 [J]. 中文信息学报, 1994, 9(2): 16-27.
[21] 陈慧.基于DCC动态流通语料库的中文组织名考察与研究[D].北京: 北京语言大学,2008.
[22] 吴友政.汉语问答系统关键技术研究 [D],北京: 中国科学院自动化研究所,2006.
[23] 刘康, 赵军. 基于“产生/判别”混合模型的分类器领域适应性问题研究 [C]. 全国模式识别学术会议论文集,北京: 中国自动化学会,中国科学院自动化研究所,2008: 7-12(最佳学生论文).
[24] ZHAO Jun, LIU Feifan, Product Named Entity Recognition in Chinese Texts [J]. International Journal of Language Resource and Evaluation (LRE), 2008, 42(2): 132-152.
[25] Satoshi Sekine, Kiyoshi Sudo, Chikashi Nobata,Extended Named Entity Hierarchy [C]//Proceedings of The Third International Conference on Language Resources and Evaluation, Spain, 2002:1818-1824
[26] Casey Whitelaw, Alex Kehlenbeck, Nemanja Petrovic, et al. Web-Scale Named Entity Recognition [C]//James G. Shanahan, et al. (Eds.) Proceedings of ACM 17th Conference on Information and Knowledge Management, California, 2008: 123-132.
[27] Fan YANG, Jun ZHAO. CRFs-Based Named Entity Recognition Incorporated with Heuristic Entity List Searching [C]//Proceedings of the Sixth SigHAN Workshop on Chinese Language Processing, Hyderabad, India, 2008: 171-174.
[28] 孙栩.基于机器学习的汉语缩略语识别与预测 [D].北京: 北京大学,2007.
[29] Javier Artiles, Julio Gonzalo, Satoshi Sekine. The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task [C]//Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-2007), Prague: Association for Computational Linguistics, 2007: 64-69.
[30] Amit Bagga, Breck Baldwin. Entity-Based Cross-document coreferencing using the vector space model [C]//Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics , Canada: Association for Computational Linguistics, 1998: 79-85.
[31] Enrique Amigo, Julio Gonzalo, Javier Artiles, et al. A comparison of extrinsic clustering evaluation metrics based on formal constraints [J]. Information Retrieval, DOI 10. 1007/S10791-008-9066-8, 2008.
[32] Michael Ben Fleischman, Eduard Hovy. Multi-Document Person Name Resolution [C]//Proceedings of the Workshop on Reference Resolution and its Applications (Held in cooperation with ACL-2004), Spain: Association for Computational Linguistics , 2004: 1-8.
[33] Nina Wacholder, Yael Ravin, Miscook Choi. Disambiguation of Proper Names in Text [C]//Proceedings of the fifth conference on Applied natural language processing, Washington: Association for Computational Linguistics, 1997: 202-208.
[34] Silviu Cucerzan. Large-Scale Named Entity Disambiguation Based on Wikipedia Data [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, Prague,, Czech Republic: Association for Computational Linguistics, 2007: 708-716.
[35] Ted Pedersen, Amruta Purandare, Anagha Kulkarni. Name discrimination by clustering similar contexts [C]//Proceedings of Sixth International Conference on Intelligent Text Processing and Computational Linguistics (CICLING-2005), Mexico, 2005: 226-237.
[36] Ted Pedersen and Anagha Kulkarni. An Unsupervised language independent method of name discrimination using second order co-Occurence Features [C]//Proceedings of Seventh International Conference on Intelligent Text Processing and Computational Linguistics (CICLING-2006), Mexico, 2006: 208-222.
[37] Ted Pedersen and Anagha Kulkarni. Unsupervised Discrimination of Person Names in web contexts [C]//Proceedings of Eighth International Conference on Intelligent Text Processing and Computational Linguistics (CICLING-2007), Mexico, 2007: 299-310.
[38] Xin Li, Paul Morie, Dan Roth. Semantic Integration in Text: From Ambiguous Names to Identifiable Entities [J]. AI Magazine, 2005, 26(1): 45-58.
[39] Bradley Malin, Edorado Airoldi, Kathleen M. Carley. A Network Analysis Model for Disambiguation of Names in Lists [J] . A Network Analysis Model for Disambiguation of Names in Lists, 2005, 11(2): 119-139.
[40] Einat Minkov, William W. Cohen, Andrew Y. Ng. Contextual Search and Name Disambiguation in Email Using Graphs [C]//Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR-2006), Washington, USA, 2006: 27-34.
[41] Josepth Hassell. Ontology-Driven Automatic Entity Disambiguation in Unstructured Text [C]//Proceedings of 5th International Semantic Web Conference (ISWC-2006), Athens, USA, 2006: 44-57.
[42] Ron Bekkerman, Andrew McCallum. Disambiguating Web Appearances of People in a Social Network [C]//Proceedings of the 14th international conference on World Wide Web (WWW-2005), Japan. 2005: 463-470.
[43] Zhaoqi Chen, Dmitri V. Kalashnikov, Sharad Mehrotra. Adaptive graphical approach to entity resolution [C]//Proceedings of ACM IEEE Joint Conference on Digital Libraries, Canada, 2007: 204-213.
[44] Xianpei Han, Jun Zhao, Person Name Disambiguation Based on Web-Based Person Mining and Categorization, Submitted to Second Web People Search Evaluation Workshop in conjunction with WWW2009, Spain, 2009.
[45] ZHAO Jun, LIU Feifan, Linguistic Theory Based Contextual Evidence Mining for Statistical Chinese Co-reference Resolution [J]. Journal of Computer Science and Technology (JCST),2007, 22(4):608-617.
[46] Y. Al-Onaizan and K. Knight. Named Entity Translation [C]//Proceedings of the second international conference on Human Language Technology Research (HLT-2002), San Diego, CA, 2002: 122-124.
[47] Kevin Knight, Jonathan Graehl: Machine Transliteration [J]. Computational Linguistics, 1998 24(4): 599-612.
[48] S. Stalls and K. Knight. Translating Names and Technical Terms in Arabic Text [C]//Proceedings of the COLING/ACL Workshop on Computational approaches to Semitic Languages, Canada, 1998.
[49] H. Meng, W. K. Lo, B. Chen, et al. Generating Phonetic Cognates to Handle Named Entities in English-Chinese Cross-Language Spoken Document Retrieval [C]//Proceedings of the Automatic Speech Recognition and Understanding Workshop, Trento, Italy, 2001.
[50] 陆敏.汉英实体翻译与实体对抽取技术研究 [D].北京: 中国科学院自动化研究所,2007.
[51] 邹波,英汉人名音译方法研究 [D].北京: 中国科学院自动化研究所,2008.
[52] Wei Gao. Phoneme-based Statistical Transliteration of Foreign Name for OOV Problem [D]. Hong Kong: The Chinese University of Hong Kong. 2004
[53] NIST, The Evaluation Plan for the ACE 2007: Pilot Evaluation of Entity Translation [EB/OL]. [2007]. Available at http://nist.gov/speech/tests/ace/2007/doc/ET07-evalplan-v1.8.pdf
[54] 陈钰枫.汉英命名实体翻译及对齐方法研究,北京: 中国科学院自动化研究所,2008.
[55] Jenq-Haur Wang, Jei-Wen Teng, Pu-Jen Cheng, et al. Translating unknown cross-lingual queries in digital libraries using a web-based approach [C]//Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, Tuscon, AZ, USA , 2004: 108-116.
[56] Pu-Jen Cheng, Wen-Hsiang Lu, Jer-Wen Teng, et al. Creating Multilingual Translation Lexicons with Regional Variations Using Web Corpora [C]//Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL-04), Spain, 2004.
[57] Masaaki Nagata, Teruka Saito, and Kenji Suzuki. Using the Web as a Bilingual Dictionary [C]//Proceedings of ACL 2001 Workshop on Data-driven Methods in Machine Translation, France, 2001: 1-8.
[58] Ying Zhang, Fei Huang, Stephan Vogel. Mining translations of OOV terms from the web through cross-lingual query expansion [C]//Proceedings of the 28th International ACM SIGIR, Brazil, 2005: 669-670.
[59] Jiang, L., Zhou, M., Chien, L., et al. Named Entity Translation with Web Mining and Transliteration [C]//Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Menlo Park, CA: International. Joint. Conferences. on. Artificial. Intelligence and AAAI Press, 2007:1629-1634.
[60] 蒋龙,周明,简立峰.利用音译和网络挖掘翻译命名实体 [J].中文信息学报,2007, 21(1): 23-29.
[61] Fan Yang, Jun Zhao, Bo Zou, et al. Chinese-English Backward Transliteration Assisted with Mining Monolingual Web Pages [C]//Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Columbus, OH: Association for Computational Linguistics, 2008: 541-549
[62] 杨帆,赵军. 基于启发式网络挖掘和非对称对齐的汉英机构名翻译方法 [R],北京: 中国科学院自动化研究所模式识别国家重点实验室,2008.
[63] Yunbo Cao, Hang Li, Base Noun Phrase Translation Using Web Data and the EM Algorithm [C]//Proceedings of the 19th International Conference on Computational Linguistics (COLING’02), Taipei, 2002:1-7.
[64] Fei Huang, Stephan Vogel, Alex Waibel, Automatic Extraction of Named Entity Translingual Equivalence Based on Multi-feature Cost Minimization [C]//Proceedings of the 2003 Annual Conference of the Association for Computational Linguistics (ACL’03), Workshop on Multilingual and Mixed-language Named Entity Recognition, Japan: Association for Computational Linguistics, ,2003:9-16.
[65] Gaolin Fang, Hao Yu, Fumihito Nishino. Chinese-English Term Translation Mining Based on Semantic Prediction [C]//Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Australia: Association for Computational Linguistics, 2006: 199--206.
[66] Dong-Hui Feng, Ya-Juan Lv, Ming Zhou, A New Approach for English-Chinese Named Entity Alignment [C]//Proceedings of 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain: Association for Computational Linguistics, 2004: 372-379.
[67] Yufeng Chen, Chengqing Zong, A Structural-Based Model for Chinese Organization Name Translation [J]. ACM Transactions on Asian Language Information Processing (ACM TALIP), 2008, 7(1):1-30.
[68] Pascale Fung, Percy Cheung. Mining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and EM [C]//Proceedings of 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain: Association for Computational Linguistics, 2004: 57-63.
[69] Richard Sproat, Tao Tao, Chengxiang Zhai. Australia, 2006 Named Entity Transliteration with Comparable Corpora [C]//Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Australia: Association for Computational Linguistics, 2006: 73-80.
[70] Li Shao, Hwee Tou Ng, Mining New Word Translation from Comparable Corpora [C]//The 20th International Conference on Computational Linguistics, Switzerland, 2004: 618-624.
[71] Dekang Lin, Shaojun Zhao, Benjamin Van Durme, et al, Mining Parenthetical Translations from the Web by Word Alignment [C]//Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-2008), Columbus, OH: Association for Computational Linguistics,2008: 994-1002.
[72] Fei Huang, Ying Zhang, Setphan Vogel. Mining Key Phrase Translations from Web Corpora [C]//Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing ( HLT-EMNLP 2005), Vancouvor, Canada: Association for Computational Linguistics, 2005: 483-490.

基金

国家863计划资助项目(2006AA01Z144);国家自然科学基金资助项目(60673042, 60875041)
PDF(1339 KB)

Accesses

Citation

Detail

段落导航
相关文章

/