信息检索技术致力于从海量的信息资源中为用户获取所需的信息。相较于传统的简单模型,近些年来的大量研究工作在提升了检索结果平均质量的同时,往往忽略了鲁棒性的问题,即造成了很多查询的性能下降,导致用户满意度的显著下降。本文提出了一种基于排序学习的查询性能预测方法,针对每一个查询,对多种模型得到的检索结果列表进行预测,将其中预测性能最优的检索结果列表展示给用户。在LETOR的三个标准数据集OHSUMED、MQ2008和MSLR-WEB10K上的一系列对比实验表明,在以经典的BM25模型作为基准的情况下,与当前最好的检索模型之一LambdaMART相比,该方法在提升了检索结果平均质量的同时,显著地减少了性能下降的查询的数量,具备较好的鲁棒性。
Abstract
The main purpose of information retrieval technology is satisfying users information needs by using massive amounts of information recource. Recent years, many techniques increase average effectiveness relative to traditional simple model while they often ignore the robustness issue. Users satisfaction will be significantly hurt because of degraded results of many queries. A query performance prediction method based on learning to rank is proposed to obtain robust ranking results. For each query, the performance of multiple ranking results generated by different models are predicted and the best one is shown to the user. A series of experiments are conducted on three standard LETOR benchmark datasets which are OHSUMED, MQ2008 and MSLR-WEB10K. The results show that, compared to one of the state-of-the-art models named LambdaMART, the ranking results obtained this way significantly reduced the number of queries whose performance are hurt with respect to BM25 model while improving the nearly same degree of everage effectiveness.
关键词
查询性能预测 /
排序学习 /
鲁棒检索排序
{{custom_keyword}} /
Key words
query performance prediction /
learning to rank /
robust ranking
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Baeza Yates R, Ribeiro Neto B. Modern information retrieval[M]. New York: ACM press, 1999.
[2] Geng X, Liu T Y, Qin T, et al. Query dependent ranking using k-nearest neighbor[C]//Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2008: 115-122.
[3] Bian J, Li X, Li F, et al. Ranking specialization for web search: a divide-and-conquer approach by using topical RankSVM[C]//Proceedings of the 19th international conference on World wide web. ACM, 2010: 131-140.
[4] Bian J, Liu T Y, Qin T, et al. Ranking with query-dependent loss for web search[C]//Proceedings of the third ACM international conference on Web search and data mining. ACM, 2010: 141-150.
[5] Wang J, Zhu J. Portfolio theory of information retrieval[C]//Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2009: 115-122.
[6] Zhu J, Wang J, Cox I J, et al. Risky business: modeling and exploiting uncertainty in information retrieval[C]//Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, 2009: 99-106.
[7] Dai N,Shokouhi M, Davison B D. Learning to rank for freshness and relevance[C]//Proceedings of the 34th International ACM SIGIR conference on Research and development in Information Retrieval. ACM, 2011: 95-104.
[8] Svore K M, Volkovs M N, Burges C J C. Learning to rank with multiple objective functions[C]//Proceedings of the 20th international conference on World wide web. ACM, 2011: 367-376.
[9] Wang L, Bennett P N, Collins-Thompson K. Robust ranking models via risk-sensitive optimization[C]//Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval. ACM, 2012: 761-770.
[10] Tomlinson S. Robust, web and terabyte retrieval with HummingbirdSearchServertm at TREC 2004[C]//Proceedings of the 13th Text Retrieval Conference.2004.
[11] He B,Ounis I. Inferring query performance using pre-retrieval predictors[C]//Proceedings of String processing and information retrieval. Springer Berlin Heidelberg, 2004: 43-54.
[12] Plachouras V, He B, Ounis I. University of Glasgow at TREC 2004: Experiments in Web, Robust, and Terabyte Tracks with Terrier[C]//Proceedings of the 13th Text Retrieval Conference.2004.
[13] Cronen Townsend S, Zhou Y, Croft W B. Predicting query performance[C]//Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2002: 299-306.
[14] Hauff C, Murdock V, Baeza Yates R. Improved query difficulty prediction for the web[C]//Proceedings of the 17th ACM conference on Information and knowledge management. ACM, 2008: 439-448.
[15] Yom Tov E, Fine S, Carmel D, et al. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval[C]//Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2005: 512-519.
[16] Zhou Y, Croft W B. Ranking robustness: a novel framework to predict query performance[C]//Proceedings of the 15th ACM international conference on information and knowledge management. ACM, 2006: 567-574.
[17] Zhou Y, Croft W B. Query performance prediction in web search environments[C]//Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2007: 543-550.
[18] Lang H, Wang B, Jones G, et al. Query performance prediction for information retrieval based on covering topic score[J]. Journal of Computer Science and technology, 2008, 23(4): 590-601.
[19] Cummins R. Predicting query performance directly from score distributions[M]. Information Retrieval Technology. Springer Berlin Heidelberg, 2011: 315-326.
[20] Markovits G, Shtok A, Kurland O, et al. Predicting query performance for fusion-based retrieval[C]//Proceedings of the 21st ACM international conference on information and knowledge management. ACM, 2012: 813-822.
[21] Wu Q, Burges C J C,Svore K M, et al. Adapting boosting for information retrieval measures[J]. Information Retrieval, 2010, 13(3): 254-270.
[22] Microsoft Research. LETOR[EB/OL]. http://research.microsoft.com/en-us/um/beijing/projects/letor/.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61232010,61173008);国家“863”高技术研究发展计划(2012AA011003,2013AA01A213);国家“973”重点基础研究发展规划(2012CB316303,2013CB329602);国家科技部“十一五”科技计划(2012BAH39B02,2012BAH46B04)
{{custom_fund}}