XUE Yuanhai1,2,3, YU Xiaoming1,2, LIU Yue1,2, GUAN Feng1,2,3, CHENG Xueqi1,2
1. Key Laboratory of Network Data Science and Technology, Chinese Academy of Sciences, Beijing 100190,China; 2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190,China; 3. University of Chinese Academy of Sciences, Beijing 100190,China
Abstract:The main purpose of information retrieval technology is satisfying users information needs by using massive amounts of information recource. Recent years, many techniques increase average effectiveness relative to traditional simple model while they often ignore the robustness issue. Users satisfaction will be significantly hurt because of degraded results of many queries. A query performance prediction method based on learning to rank is proposed to obtain robust ranking results. For each query, the performance of multiple ranking results generated by different models are predicted and the best one is shown to the user. A series of experiments are conducted on three standard LETOR benchmark datasets which are OHSUMED, MQ2008 and MSLR-WEB10K. The results show that, compared to one of the state-of-the-art models named LambdaMART, the ranking results obtained this way significantly reduced the number of queries whose performance are hurt with respect to BM25 model while improving the nearly same degree of everage effectiveness.
[1] Baeza Yates R, Ribeiro Neto B. Modern information retrieval[M]. New York: ACM press, 1999. [2] Geng X, Liu T Y, Qin T, et al. Query dependent ranking using k-nearest neighbor[C]//Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2008: 115-122. [3] Bian J, Li X, Li F, et al. Ranking specialization for web search: a divide-and-conquer approach by using topical RankSVM[C]//Proceedings of the 19th international conference on World wide web. ACM, 2010: 131-140. [4] Bian J, Liu T Y, Qin T, et al. Ranking with query-dependent loss for web search[C]//Proceedings of the third ACM international conference on Web search and data mining. ACM, 2010: 141-150. [5] Wang J, Zhu J. Portfolio theory of information retrieval[C]//Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2009: 115-122. [6] Zhu J, Wang J, Cox I J, et al. Risky business: modeling and exploiting uncertainty in information retrieval[C]//Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM, 2009: 99-106. [7] Dai N,Shokouhi M, Davison B D. Learning to rank for freshness and relevance[C]//Proceedings of the 34th International ACM SIGIR conference on Research and development in Information Retrieval. ACM, 2011: 95-104. [8] Svore K M, Volkovs M N, Burges C J C. Learning to rank with multiple objective functions[C]//Proceedings of the 20th international conference on World wide web. ACM, 2011: 367-376. [9] Wang L, Bennett P N, Collins-Thompson K. Robust ranking models via risk-sensitive optimization[C]//Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval. ACM, 2012: 761-770. [10] Tomlinson S. Robust, web and terabyte retrieval with HummingbirdSearchServertm at TREC 2004[C]//Proceedings of the 13th Text Retrieval Conference.2004. [11] He B,Ounis I. Inferring query performance using pre-retrieval predictors[C]//Proceedings of String processing and information retrieval. Springer Berlin Heidelberg, 2004: 43-54. [12] Plachouras V, He B, Ounis I. University of Glasgow at TREC 2004: Experiments in Web, Robust, and Terabyte Tracks with Terrier[C]//Proceedings of the 13th Text Retrieval Conference.2004. [13] Cronen Townsend S, Zhou Y, Croft W B. Predicting query performance[C]//Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2002: 299-306. [14] Hauff C, Murdock V, Baeza Yates R. Improved query difficulty prediction for the web[C]//Proceedings of the 17th ACM conference on Information and knowledge management. ACM, 2008: 439-448. [15] Yom Tov E, Fine S, Carmel D, et al. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval[C]//Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2005: 512-519. [16] Zhou Y, Croft W B. Ranking robustness: a novel framework to predict query performance[C]//Proceedings of the 15th ACM international conference on information and knowledge management. ACM, 2006: 567-574. [17] Zhou Y, Croft W B. Query performance prediction in web search environments[C]//Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2007: 543-550. [18] Lang H, Wang B, Jones G, et al. Query performance prediction for information retrieval based on covering topic score[J]. Journal of Computer Science and technology, 2008, 23(4): 590-601. [19] Cummins R. Predicting query performance directly from score distributions[M]. Information Retrieval Technology. Springer Berlin Heidelberg, 2011: 315-326. [20] Markovits G, Shtok A, Kurland O, et al. Predicting query performance for fusion-based retrieval[C]//Proceedings of the 21st ACM international conference on information and knowledge management. ACM, 2012: 813-822. [21] Wu Q, Burges C J C,Svore K M, et al. Adapting boosting for information retrieval measures[J]. Information Retrieval, 2010, 13(3): 254-270. [22] Microsoft Research. LETOR[EB/OL]. http://research.microsoft.com/en-us/um/beijing/projects/letor/.