Learning to Rank Entities from Dual Feature Spaces
ZHAO Yixin1, NIU Shuzi1, JI Chunyan2, LU Fei2, XU Rui3
1.Institute of Software, Chinese Academy of Sciences, Beijing 100190, China; 2.Qilu Hospital of Shandong University, Jinan, Shandong 250012, China; 3.Sinoparasoft Company Limited, Beijing 100190, China
Abstract:Entity retrieval from knowledge graph is of substantial significance as the large scale knowledge graphs appear, and the industry demand on effectively managing the domain knowledge graphs. Given a certain knowledge graph and a user query, entity retrieval aims at obtaining a ranking list of entities from the knowledge graph accor-ding to its relevance to the query. Being treated as the matching between the query and entities, traditional entity retrieval models map both user queries and entities into the word feature space. However, it does not work when two words in the name of an entity are assumed to be independent. In this paper, we propose to project both user queries and entities into a dual feature space, namely entity-word feature space. First, we represent entities as multiple domains and extract ranking features from them. Then, learning to rank models are employed to train a ranking model from this dual feature space. Experimental results on benchmark datasets show that our proposed method outperform state-of-the-art baselines significantly.
[1] Pujara J, Singh S. Mining knowledge graphs from text[C]//Proceedings of the 11th ACM International Conference on Web Search and Data Mining. ACM, 2018: 789-790. [2] Zhiltsov N, Kotov A, Nikolaev F. Fielded sequential dependence model for ad-hoc entity retrieval in the web of data[C]//Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2015: 253-262. [3] Robertson S, Zaragoza H, Taylor M. Simple BM25 extension to multiple weighted fields[C]//Proceedings of the 13th ACM International Conference on Information and Knowledge Management. ACM, 2004: 42-49. [4] Ogilvie P, Callan J, Callan J. Combining document representations for known-item search[C]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. ACM, 2003: 143-150. [5] Wang Q, Mao Z, Wang B, et al. Knowledge graph embedding: A survey of approaches and applications[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(12): 2724-2743. [6] Schütze H, Manning C D, Raghavan P. Introduction to information retrieval[C]//Proceedings of the International Communication of Association for Computing Machinery Conference. 2008: 260. [7] Deerwester S, Dumais S T, Furnas G W, et al. Indexing by latent semantic analysis[J]. Journal of the American Society for Information Science, 1990, 41(6): 391-407. [8] Robertson S E, Walker S. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval[C]//Proceedings of SIGIR’94. Springer, London, 1994: 232-241. [9] Ponte J M, Croft W B. A language modeling approach to information retrieval[D]. University of Massachusetts at Amherst, 1998. [10] Metzler D, Croft W B. A Markov random field model for term dependencies[C]//Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2005: 472-479. [11] Joachims T. Optimizing search engines using clickthrough data[C]//Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2002: 133-142. [12] Burges C, Shaked T, Renshaw E, et al. Learning to rank using gradient descent[C]//Proceedings of the 22nd International Conference on Machine Learning (ICML-05), 2005: 89-96. [13] Cao Z, Qin T, Liu T, et al. Learning to rank: From pairwise approach to listwise approach[C]//Proceedings of the 24th International Conference on Machine Learning, 2007: 129-136. [14] Burges C J C. From ranknet to lambdarank to lambdamart: An overview[J]. Learning, 2010, 11(23-581): 81-99. [15] Huang P S, He X, Gao J, et al. Learning deep structured semantic models for web search using clickthrough data[C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. ACM, 2013: 2333-2338. [16] Guo J, Fan Y, Ai Q, et al. A deep relevance matching model for ad-hoc retrieval[C]//Proceedings of the 25th ACM International Conference on Information and Knowledge Management. ACM, 2016: 55-64. [17] Nikolaev F, Kotov A, Zhiltsov N. Parameterized fielded term dependence models for ad-hoc entity retrieval from knowledge graph[C]//Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2016: 435-444. [18] Park D H, Liu M, Zhai C X, et al. Leveraging user reviews to improve accuracy for mobile app retrieval[C]//Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2015: 533-542. [19] Xiong C, Power R, Callan J. Explicit semantic ranking for academic search via knowledge graph embedding[C]//Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017: 1271-1279. [20] Xiong C, Callan J, Liu T Y. Word-entity duet representations for document ranking[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2017: 763-772. [21] Jameel S, Bouraoui Z, Schockaert S. Member: Max-margin based embeddings for entity retrieval[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2017: 783-792. [22] Meij E, Weerkamp W, De Rijke M. Adding semantics to microblog posts[C]//Proceedings of the 5th ACM International Conference on Web Search and Data Mining. ACM, 2012: 563-572. [23] Ratinov L, Roth D, Downey D, et al. Local and global algorithms for disambiguation to wikipedia[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies(Volume 1). Association for Computational Linguistics, 2011: 1375-1384. [24] Ferragina P, Scaiella U. TAGME: On-the-fly annotation of short text fragments (by wikipedia entities)[C]//Proceedings of the 19th ACM International Conference on Information and Knowledge Management, 2010: 1625-1628. [25] Cornolti M, Ferragina P, Ciaramita M. A framework for benchmarking entity-annotation systems[C]//Proceedings of the 22nd International Conference on World Wide Web. ACM, 2013: 249-260. [26] Xiong C, Callan J, Liu T Y. Bag-of-entities representation for ranking[C]//Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval. ACM, 2016: 181-184. [27] Qin T, Liu T Y, Xu J, et al. LETOR: A benchmark collection for research on learning to rank for information retrieval[J]. Information Retrieval, 2010, 13(4): 346-374. [28] Balog K, Neumayer R. A test collection for entity search in DBpedia[C]//Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2013: 737-740. [29] Jarvelin K, Kekalainen J. Cumulated gain-based evaluation of IR techniques[J]. ACM Transactions on Information Systems, 2002, 20(4): 422-446. [30] Lan Y, Niu S, Guo J, et al. Is top-k sufficient for ranking?[C]//Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, 2013: 1261-1270.