Abstract:A collaborative entity disambiguation method based on weighted feature overlap relatedness is proposed in this paper. This method make use of weighted feature overlap relatedness for computing the similarity between entity names. We define some deferent similarity formulas for computing entity similarity matrix, then the affinity propagation clustering algorithm is used to get the disambiguation results. Evaluation on the CLP-2012 corpus shows that our method can achieve competitive performance, attains 84.01% precision, 87.75% recall and 85.65% F-score.
[1] 赵军. 命名实体识别, 排歧和跨语言关联[J]. 中文信息学报, 2009, 23(2): 3-17. [2] Ji H, Grishman R. Knowledge base population: Successful approaches and challenges[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 2011: 1148-1158. [3] Ji H, Grishman R, Dang H T, et al. Overview of the TAC 2010 knowledge base population track[C]//Proceedings of Third Text Analysis Conference (TAC 2010). 2010. [4] Artiles J, Gonzalo J, Sekine S. The semeval-2007 weps evaluation: Establishing a benchmark for the web people search task[C]//Proceedings of the 4th International Workshop on Semantic Evaluations. Association for Computational Linguistics, 2007: 64-69. [5] Wang Z H H, Li S. The Task 2 of CIPS-SIGHAN 2012 Named Entity Recognition and Disambiguation in Chinese Bakeoff[C]//Proceedings of The 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP-2012). 2012: 108-114. [6] Auer S, Bizer C, Kobilarov G, et al. Dbpedia: A nucleus for a web of open data[M]//The semantic web. Springer Berlin Heidelberg, 2007: 722-735. [7] Bollacker K, Evans C, Paritosh P, et al. Freebase: a collaboratively created graph database for structuring human knowledge[C]//Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008: 1247-1250. [8] Cucerzan S. Large-Scale Named Entity Disambiguation Based on Wikipedia Data[C]//Proceedings of the EMNLP-CoNLL. 2007, 7: 708-716. [9] Milne D, Witten I H. Learning to Link with Wikipedia[C]//Proceedings of the 17th ACM conference on Information and knowledge management. ACM, 2008: 509-518. [10] Fan Q, ZAN H, CHAI Y, et al. Chinese personal name disambiguation based on vector space model[C]//Proceedings of The 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP-2012). 2012: 152-158. [11] Cilibrasi R L, Vitanyi P M B. The google similarity distance[J]. Knowledge and Data Engineering, IEEE Transactions on, 2007, 19(3): 370-383. [12] Wang L, Li S, Wong D F, et al. A joint chinese named entity recognition and disambiguation system[C]//Proceedings of The 2nd CIPSSIGHAN Joint Conference on Chinese Language Processing (CLP-2012). 2012: 146-151. [13] Liu J, Xu R, Lu Q, et al. Explore chinese encyclopedic knowledge to disambiguate person names[C]//Proceedings of The 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP-2012). 2012: 138-145. [14] Han W, Liu G, Mao Y, et al. Attribute based Chinese Named Entity Recognition and Disambiguation[C]//Proceedings of The 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP-2012) . 2012: 127-131. [15] Peng Z, Sun L, Han X. SIR-NERD: A Chinese Named Entity Recognition and Disambiguation System using a Two-Stage Method[C]//Proceedings of The 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP-2012). 2012: 114-120. [16] Minkov E, Cohen W W, Ng A Y. Contextual search and name disambiguation in email using graphs[C]//Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2006: 27-34. [17] Bekkerman R, McCallum A. Disambiguating web appearances of people in a social network[C]//Proceedings of the 14th international conference on World Wide Web. ACM, 2005: 463-470. [18] 郎君, 秦兵, 宋巍, 等. 基于社会网络的人名检索结果重名消解[J]. 计算机学报, 2009, 32(7): 1365-1374. [19] Hoffart J, Seufert S, Nguyen D B, et al. Kore: Keyphrase overlap relatedness for entity disambiguation[C] //Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 2012: 545-554. [20] Ikeda M, Ono S, Sato I, et al. Person name disambiguation on the web by two-stage clustering[C]//Proceedings of the 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference. 2009. [21] E Elmacioglu, Y Tan, S Yan, et al. PSNUS: Web People Name Disambiguation by Simple Clustering with Rich Features[C] //Proceedings of The SemEval-2007, 2007: 268-271. [22] Och F J. Minimum error rate training in statistical machine translation[C]//Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Association for Computational Linguistics, 2003: 160-167. [23] Frey B J, Dueck D. Clustering by passing messages between data points[J]. Science, 2007, 315(5814): 972-976. [24] 刘挺, 车万翔, 李正华. 语言技术平台[J]. 中文信息学报, 2012, 25(6): 53-62. [25] http://genes.toronto.edu/index.php?q=affinity%20propagation[OL]. [26] Hao Zong, Derek F Wong, Lidia S Chao. A template based hybrid model for chinese personal name disambiguation[C]//Proceedings of The 2nd CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP-2012).2012: 121-126.