基于多源知识和Ranking SVM的中文微博命名实体链接

陈万礼,昝红英,吴泳钢

PDF(1722 KB)
PDF(1722 KB)
中文信息学报 ›› 2015, Vol. 29 ›› Issue (5) : 117-125.
信息抽取与文本挖掘

基于多源知识和Ranking SVM的中文微博命名实体链接

  • 陈万礼,昝红英,吴泳钢
作者信息 +

Chinese Micro-blog Named Entity Linking Based on Multisource Knowledge

  • CHEN Wanli, ZAN Hongying,WU Yonggang
Author information +
History +

摘要

命名实体是文本中承载信息的重要单元,正确分析存在歧义的命名实体对文本的理解起着关键性作用。该文提出基于多源知识和Ranking SVM的中文微博命名实体链接,结合同义词词典、百科资源等知识产生初始候选实体集合,同时从文本中抽取多种组合特征,利用Ranking SVM对候选实体集合进行排序,从而得到目标实体。在NLP&CC2014中文微博实体链接评测数据集上进行了实验,获得了89.40%的平均准确率,与NLP&CC2014中文微博实体链接评测取得最好成绩的系统相比,本文的系统具有一定的优势。

Abstract

Named entity is an important component conveying information in texts, and an accurate understanding of named entities is necessary to ensure a correct analysis of the text information. This paper proposes a Chinese micro-blog entity linking strategy based on multi-resource knowledge under Ranking SVM framework. It combines a dictionary of synonyms, the encyclopedia resources to produce an initial set of candidate entities , then extracts various combinations of featuresfor Ranking SVM to generate the target entity set. The evaluation on data sets of NLP&CC2014 Chinese micro-blog entity linking track shows a micro average accuracy of 89.40%, which is better than the state-of-the-art result.

关键词

命名实体 / 中文微博实体链接 / 同义词词典 / 百科资源 / Ranking SVM / 语义特征

Key words

named entity / chinese micro-blog entity linking / dictionary of synonyms / encyclopedia resources / Ranking SVM / semantic features

引用本文

导出引用
陈万礼,昝红英,吴泳钢. 基于多源知识和Ranking SVM的中文微博命名实体链接. 中文信息学报. 2015, 29(5): 117-125
CHEN Wanli, ZAN Hongying,WU Yonggang. Chinese Micro-blog Named Entity Linking Based on Multisource Knowledge. Journal of Chinese Information Processing. 2015, 29(5): 117-125

参考文献

[1] 中国互联网信息中心. 第35次中国互联网络发展状况统计报告[R].北京:中国互联网信息中心.2015.
[2] 郭宇航, 秦兵, 刘挺等. 实体链指技术研究进展[J]. 智能计算机与应用, 2014, 4(5).
[3] Mihalcea R, Csomai A. Wikify! Linking Documents to Encyclopedic Knowledge[C]//Proceedings of the 16th ACM Conference on Information and Knowledge Management. Association for Computing Machinery, 2007: 233-242.
[4] Milne D, Witten I H. Learning to Link with Wikipedia[C]//Proceedings of the 17th ACM Conference on Information and Knowledge Management. Association for Computing Machinery, 2008: 509-518.
[5] Bunescu R C, Pasca M. Using Encyclopedic Knowledge for Named Entity Disambiguation[C]//Proceedings of the 11st Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2006:9-16.
[6] Cucerzan S. Large-Scale Named Entity Disambiguation Based on Wikipedia Data[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 2007:708-716.
[7] Gottipati S, Jiang J. Linking Entities to a Knowledge Base with Query Expansion[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011: 804-813.
[8] Sun Y, Zou X, Lin L, et al. ITNLP Entity Linking System at TAC 2013[C]//Proceedings of Text Analysis Conference. United States National Institute of Standards and Technology, 2013.
[9] Zhang W, Sim Y C, Su J, et al. Nus-i2r: Learning a Combined System for Entity Linking[C]//Proceedings of Text Analysis Conference. United States National Institute of Standards and Technology, 2010.
[10] Varma V, Bysani P, Kranthi Reddy V B, et al. IIIT Hyderabad at TAC 2009[C]//Proceedings of Test Analysis Conference. U.S. National Institute of Standards and Technology, 2009.
[11] Han X, Zhao J. NLPR_KBP in TAC 2009 KBP Track: A Two-Stage Method to Entity Linking [C]//Proceedings of Test Analysis Conference. U.S. National Institute of Standards and Technology, 2009.
[12] Zheng Z, Li F, Huang M, et al. Learning to Link Entities with Knowledge Base[C]//Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010: 483-491.
[13] Zhang W, Su J, Tan C L, et al. Entity Linking Leveraging: Automatically Generated Annotation[C]//Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 2010: 1290-1298.
[14] Han X, Zhao J. Structural Semantic Relatedness: a Knowledge-based Method to Named Entity Disambiguation[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2010: 50-59.
[15] Herbrich R, Graepel T, Obermayer K. Large Margin Rank Boundaries for Ordinal Regression[A].In: Alexander J. Smola. Advances in Neural Information Processing Systems[M]. Massachusetts: The MIT Press, 1999: 115-132.
[16] Cao Y, Xu J, Liu T Y, et al. Adapting Ranking SVM to Document Retrieval[C]//Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, 2006: 186-193.
[17] Joachims T. Optimizing Search Engines Using Clickthrough Data[C]//Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2002: 133-142.
[18] Joachims T. Training Linear SVMs in Linear Time[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2006: 217-226.
[19] Dill S, Eiron N, Gibson D, et al. SemTag and Seeker: Bootstrapping the Semantic Web via Automated Semantic Annotation[C]//Proceedings of the 12th International Conference on World Wide Web. Association for Computing Machinery, 2003: 178-186.
[20] Chang A X, Spitkovsky V I, Yeh E, et al. Stanford-UBC Entity Linking at TAC-KBP[C]//Proceedings of Text Analysis Conference. United States National Institute of Standards and Technology, 2010.
[21] McNamee P. HLTCOE Efforts in Entity Linking at TAC KBP 2010[C]//Proceedings of Text Analysis Conference. United States National Institute of Standards and Technology, 2010.

陈万礼(1992—),通信作者,硕士,主要研究领域为自然语言处理。
E-mail:wanli2013nlp@foxmail.com昝红英(1966—),教授,主要研究领域为自然语言处理。
E-mail:iehyzan@zzu.edu.cn吴泳钢(1987—),硕士,主要研究领域为自然语言处理。
E-mail:wygchina@sina.com
(上接第97页)

Named Entity Tags[C]//Proceedings of the IJCNLP. 2008: 545-552.
[9] 谌志群, 高飞, 曾智军. 基于中文维基百科的词语相关度计算[J]. 情报学报, 2013, 31(12): 1265-1270.
[10] 张苇如, 孙乐, 韩先培. 基于维基百科和模式聚类的实体关系抽取方法[J]. 中文信息学报, 2012, 26(2): 75-81.
[11] 梅家驹. 同义詞詞林[M]. 上海: 上海辞书出版社, 1983.
[12] Sekine S, Sudo K, Nobata C. Extended Named Entity Hierarchy[C]//Proceedings of the LREC. 2002.
[13] Tkatchenko M, Ulanov A, Simanovsky A. Classifying Wikipedia entities into fine-grained classes[C]//Proceedings of the Data Engineering Workshops (ICDEW), 2011 IEEE 27th International Conference on. IEEE, 2011: 212-217.

基金

国家社会科学基金(14BYY096);国家自然科学基金(61402419,61272221);国家高技术研究发展863计划(2012AA011101);国家重点基础研究发展计划973课题(2014CB340504)
PDF(1722 KB)

648

Accesses

0

Citation

Detail

段落导航
相关文章

/