查询推荐是搜索引擎系统中的一项重要技术,其通过推荐更合适的查询以提高用户的搜索体验。现有方法能够找到直接通过某种属性关联的相似查询,却忽略了具有间接关联的语义相关查询。该文将用户查询及查询间直接联系建模为查询关系图,并在图结构相似度算法SimRank的基础上提出了加权SimRank (简称WSimRank)用于查询推荐。WSimRank综合考虑了查询关系图的全局信息,因而能挖掘出查询间的间接关联和语义关系。然而,WSimRank复杂度太高而难以实用,该文将WSimRank转换为一个状态层次图的遍历和计算过程,进而采用动态规划、剪枝等策略对其进行优化从而可以实际应用。在大规模真实Web搜索日志上的实验表明, WSimRank在各项评价指标上均优于SimRank和传统查询推荐方法,其MAP指标接近0.9。
Abstract
Query recommendation as an important technology used in search engines suggests relevant queries to help users to reformulate more accurate queries. Existing approaches of query suggestion compute query similarity based on direct matching of query properties. However, it is hard to find the semantic relevant queries that are related indirectly. In this paper, queries are modeled by a query relation graph where query similarity is computed using WSimRank, a revised algorithm based on SimRank. WSimRank takes the edge information and global structure of query relation graph into account so that it can find the latent semantic relations between queries. To reduce the high complexity of basic WSimRank w.r.t real large query relation graph, this paper changes the WSimRank into a state graph and optimized with dynamic programming and pruning. Experiments on large real search engine query logs show that WSimRank outperforms SimRank and other conventional approaches on query suggestion. The MAP of query suggestions generated by WSimRank achieves nearly 0.9.
Key wordscomputer application; Chinese information processing;search engine; query suggestion; SimRank; WSimRank
关键词
计算机应用 /
中文信息处理 /
搜索引擎 /
查询推荐 /
SimRank /
WSimRank
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
search engine /
query suggestion /
SimRank /
WSimRank
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Y. Li, S. Zhang, B. Wang, J. Li. Characteristics of Chinese Web Searching: A Large-Scale Analysis of Chinese Query Logs [J]. Journal of Computational Information Systems. Bethel: Binary Information Press, 2008, 4(3): 1127-1136.
[2] M. Strohmaier, M. Krll, C. Krner. Intentional Query Suggestion: Making User Goals More Explicit During Search [C]//WSCD ’09: Proceedings of the 2009 workshop on Web Search Click Data. New York, NY, USA: ACM, 2009: 68-74.
[3] B.M. Fonseca, Golghe, P.B., Moura, E.S.d., and Ziviani, N. Discovering Search Engine Related Queries Using Association Rules [J]. Journal of Web Engineering, 2004, 2(4): 215-227.
[4] C.-K Huang., Chien, L.-F., and Oyang, Y.-J. Relevant term suggestion in interactive web search based on contextual information in query session logs [J]. Journal of the American Society for Information Science and Technology, 2003, 54(7): 638-649.
[5] R. Baeza-Yates, Hurtado, C., and Mendoza, M. Query Recommendation Using Query Logs in Search Engines[C]//In: Book Query Recommendation Using Query Logs in Search Engines, Springer Berlin/Heidelberg, 2004: 588-596.
[6] H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen and H. Li, Context-aware query suggestion by mining click-through and session data [C]//Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. Las Vegas, Nevada, New York, NY, USA: ACM, 2008: 875-883.
[7] M. Sahami and T. D. Heilman, A web-based kernel function for measuring the similarity of short text snippets [C]//Proceedings of the 15th international conference on World Wide Web. New York, NY, USA: ACM, 2006: 377-386.
[8] J.-R. Wen, J.-Y. Nie and H.-J. ZHang, Query clustering using user logs [J]. ACM Trans. Inf. Syst., 2002,20: 59-81.
[9] G. Jeh and J. Widom, SimRank: a measure of structural-context similarity [C]//Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM, 2002: 538-543.
[10] J. Xu, and W.B. Croft, Query expansion using local and global document analysis [C]//Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 1996: 4-11.
[11] S. Deerwester, et al. Indexing by latent semantic analysis [J]. JASIS, January 1999, 41(6):391-407.
[12] Y. Chen, G.-R. Xue, and Y. Yu, Advertising keyword suggestion based on concept hierarchy [C]//Proceedings of the international conference on Web search and web data mining. New York, NY, USA: ACM, 2008: 251-260.
[13] R. Jones et al., Generating query substitutions [C]//Proceedings of the 15th international conference on World Wide Web. New York, NY, USA: ACM, 2006: 387-396.
[14] I. Antonellis, H. G. Molina and C. C. Chang, Simrank++: query rewriting through link analysis of the click graph [C]//Proc. VLDB Endow. 2008: 408-421.
[15] 张磊,李亚楠,王斌,等. 网页搜索引擎查询日志的session划分研究[J]. 中文信息学报,2009,23(2):54-61.
[16] D. Fogaras and B. Racz, Scaling link-based similarity search [C]//Proceedings of the 14th international conference on World Wide Web. New York, NY, USA: ACM, 2005: 641-650.
[17] D. Lizorkin, P. Velikhov, M. Grinev and D. Turdakov, Accuracy estimate and optimization techniques for SimRank computation [C]//Proc. VLDB Endow., 2008: 422-433.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60603094);北京市自然科学基金资助项目(4082030);国家973资助项目(2007CB311103);国家863计划资助项目(2006AA010105)
{{custom_fund}}