随着大规模知识图谱的出现以及企业高效管理领域知识图谱的需求,知识图谱中的自组织实体检索成为研究热点。给定知识图谱以及用户查询,实体检索的目标在于从给定的知识图谱中返回实体的排序列表。从匹配的角度来看,传统的实体检索模型大都将用户查询和实体统一映射到词的特征空间。这样做具有明显的缺点,例如,将同属于一个实体的两个词视为独立的。为此,该文提出将用户查询和实体同时映射到实体与词两个特征空间方法,称为双特征空间的排序学习。首先将实体抽象成若干个域。之后从词空间和实体空间两个维度分别抽取排序特征,最终应用于排序学习算法中。实验结果表明,在标准数据集上,双特征空间的实体排序学习模型性能显著优于当前先进的实体检索模型。
Abstract
Entity retrieval from knowledge graph is of substantial significance as the large scale knowledge graphs appear, and the industry demand on effectively managing the domain knowledge graphs. Given a certain knowledge graph and a user query, entity retrieval aims at obtaining a ranking list of entities from the knowledge graph accor-ding to its relevance to the query. Being treated as the matching between the query and entities, traditional entity retrieval models map both user queries and entities into the word feature space. However, it does not work when two words in the name of an entity are assumed to be independent. In this paper, we propose to project both user queries and entities into a dual feature space, namely entity-word feature space. First, we represent entities as multiple domains and extract ranking features from them. Then, learning to rank models are employed to train a ranking model from this dual feature space. Experimental results on benchmark datasets show that our proposed method outperform state-of-the-art baselines significantly.
关键词
知识图谱 /
实体检索 /
双特征空间
{{custom_keyword}} /
Key words
knowledge graph /
entity retrieval /
dual features space
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Pujara J, Singh S. Mining knowledge graphs from text[C]//Proceedings of the 11th ACM International Conference on Web Search and Data Mining. ACM, 2018: 789-790.
[2] Zhiltsov N, Kotov A, Nikolaev F. Fielded sequential dependence model for ad-hoc entity retrieval in the web of data[C]//Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2015: 253-262.
[3] Robertson S, Zaragoza H, Taylor M. Simple BM25 extension to multiple weighted fields[C]//Proceedings of the 13th ACM International Conference on Information and Knowledge Management. ACM, 2004: 42-49.
[4] Ogilvie P, Callan J, Callan J. Combining document representations for known-item search[C]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. ACM, 2003: 143-150.
[5] Wang Q, Mao Z, Wang B, et al. Knowledge graph embedding: A survey of approaches and applications[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(12): 2724-2743.
[6] Schütze H, Manning C D, Raghavan P. Introduction to information retrieval[C]//Proceedings of the International Communication of Association for Computing Machinery Conference. 2008: 260.
[7] Deerwester S, Dumais S T, Furnas G W, et al. Indexing by latent semantic analysis[J]. Journal of the American Society for Information Science, 1990, 41(6): 391-407.
[8] Robertson S E, Walker S. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval[C]//Proceedings of SIGIR’94. Springer, London, 1994: 232-241.
[9] Ponte J M, Croft W B. A language modeling approach to information retrieval[D]. University of Massachusetts at Amherst, 1998.
[10] Metzler D, Croft W B. A Markov random field model for term dependencies[C]//Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2005: 472-479.
[11] Joachims T. Optimizing search engines using clickthrough data[C]//Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2002: 133-142.
[12] Burges C, Shaked T, Renshaw E, et al. Learning to rank using gradient descent[C]//Proceedings of the 22nd International Conference on Machine Learning (ICML-05), 2005: 89-96.
[13] Cao Z, Qin T, Liu T, et al. Learning to rank: From pairwise approach to listwise approach[C]//Proceedings of the 24th International Conference on Machine Learning, 2007: 129-136.
[14] Burges C J C. From ranknet to lambdarank to lambdamart: An overview[J]. Learning, 2010, 11(23-581): 81-99.
[15] Huang P S, He X, Gao J, et al. Learning deep structured semantic models for web search using clickthrough data[C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. ACM, 2013: 2333-2338.
[16] Guo J, Fan Y, Ai Q, et al. A deep relevance matching model for ad-hoc retrieval[C]//Proceedings of the 25th ACM International Conference on Information and Knowledge Management. ACM, 2016: 55-64.
[17] Nikolaev F, Kotov A, Zhiltsov N. Parameterized fielded term dependence models for ad-hoc entity retrieval from knowledge graph[C]//Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2016: 435-444.
[18] Park D H, Liu M, Zhai C X, et al. Leveraging user reviews to improve accuracy for mobile app retrieval[C]//Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2015: 533-542.
[19] Xiong C, Power R, Callan J. Explicit semantic ranking for academic search via knowledge graph embedding[C]//Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017: 1271-1279.
[20] Xiong C, Callan J, Liu T Y. Word-entity duet representations for document ranking[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2017: 763-772.
[21] Jameel S, Bouraoui Z, Schockaert S. Member: Max-margin based embeddings for entity retrieval[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2017: 783-792.
[22] Meij E, Weerkamp W, De Rijke M. Adding semantics to microblog posts[C]//Proceedings of the 5th ACM International Conference on Web Search and Data Mining. ACM, 2012: 563-572.
[23] Ratinov L, Roth D, Downey D, et al. Local and global algorithms for disambiguation to wikipedia[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies(Volume 1). Association for Computational Linguistics, 2011: 1375-1384.
[24] Ferragina P, Scaiella U. TAGME: On-the-fly annotation of short text fragments (by wikipedia entities)[C]//Proceedings of the 19th ACM International Conference on Information and Knowledge Management, 2010: 1625-1628.
[25] Cornolti M, Ferragina P, Ciaramita M. A framework for benchmarking entity-annotation systems[C]//Proceedings of the 22nd International Conference on World Wide Web. ACM, 2013: 249-260.
[26] Xiong C, Callan J, Liu T Y. Bag-of-entities representation for ranking[C]//Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval. ACM, 2016: 181-184.
[27] Qin T, Liu T Y, Xu J, et al. LETOR: A benchmark collection for research on learning to rank for information retrieval[J]. Information Retrieval, 2010, 13(4): 346-374.
[28] Balog K, Neumayer R. A test collection for entity search in DBpedia[C]//Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2013: 737-740.
[29] Jarvelin K, Kekalainen J. Cumulated gain-based evaluation of IR techniques[J]. ACM Transactions on Information Systems, 2002, 20(4): 422-446.
[30] Lan Y, Niu S, Guo J, et al. Is top-k sufficient for ranking?[C]//Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, 2013: 1261-1270.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61602451);山东省重大科技创新工程项目(2018CXGC1215);山东大学基本科研业务费资助项目(26010177611018)
{{custom_fund}}