Abstract:Learning to rank is one of the most attractive areas in information retrieval. Much attention has been paid on the robustness of ranking algorithms to deal with noise which is inevitable in the training set. Previous work observes that ranking performance of the same algorithm showed totally different noise sensitivities. The performance degradation of ranking models boils down to the training set. Thus the underlying reason for different sensitivities lies in some attribute of training data. Experimental results on LETOR3.0 suggest that if the document pairs of the same training set scatter more dispersedly, the model from this training set is less influenced by the error document pairs and the training set is thus less sensitive to noise. Key wordslearning to rank; data quality; noise sensitivity
[1] Sheng, et al. Get another label? improving data quality and data mining using multiple, noisy labelers[C]//Proceeding of the 14th ACM SIGKDD. New York: ACM, 2008: 614-622. [2] Xu Jingfang, Chen Chuanliang, Xu Gu, et al. Improving quality of training data for learning to rank using click-through data[C]//Proceedings of the third WSDM. New York: ACM, 2010: 171-180. [3] Nettleton D. F., Orriols-Puig A., Fornells A., et al. A study of the effect of different types of noise on the precision of supervised learning techniques [J]. Artificial Intelligence Review, 2010, 33: 275-306. [4] Chapelle O., Chang Yi, Liu Tie-Yan. Future directions in learning to rank [J]. Journal of Machine Learning Research, 2011, 14: 91-100. [5] Tsivtsivadze E., Cseke B., Heskes T. Kernel Principal Component Ranking: Robust Ranking on Noisy Data[C]//Proceedings of the ECML/PKDD-Workshop on Preference Learning. Pascal Lecture Series, 2009: 101-113. [6] Carvalho V. R., Elsas J. L., Cohen W. W., et al. Suppressing outliers in pairwise preference ranking[C]//Proceeding of the 17th CIKM, New York: ACM, 2008: 1487-1488. [7] Aslam J. A., Kanoulas E., Pavlu V., et al. Document selection methodologies for efficient and effective learning-to-rank[C]//Proceedings of the 32nd international ACM SIGIR,New York: ACM, 2009: 468-475. [8] Geng Xiubo, Qin Tao, Liu Tie-Yan, et al. Selecting optimal training data for learning to rank [J]. Information Processing & Management, 2011, 47(5): 730-741. [9] Yang Hui, Mityagin A., Svore K. M., et al. Collecting high quality overlapping labels at low cost [C]//Proceeding of the 33rd international ACM SIGIR. New York: ACM, 2010: 459-466. [10] Kumar A., Lease M. Learning to rank from a noisy crowd [C]//Proceedings of the 34th international ACM SIGIR. New York: ACM, 2011: 1221-1222. [11] Kanoulas E., Savev S., Metrikov P., et al. A large-scale study of the effect of training set characteristics over learning-to-rank algorithms [C]//Proceedings of the 34th international ACM SIGIR. New York: ACM, 2011. 1243-1244. [12] Qin Tao, Liu Tie-Yan, Xu Jun, et al. LETOR: A benchmark collection for research on learning to rank for information retrieval [J]. Information Retrieval, 2010, 13(4): 346-374. [13] Joachims T. Optimizing search engines using click-through data [C]//Proceedings of the eighth ACM SIGKDD. New York: ACM, 2002: 133-142. [14] Zhe Cao, Tao Qin, et al. Learning to rank: from pairwise approach to listwise approach [C]//Proceedings of the 24th International Conference on Machine Learning. New York: ACM, 2007: 129-136. [15] Verbaeten S., Van A. A. Ensemble methods for noise elimination in classification problems [C]//Proceedings of the 4th international conference on multiple classifier systems. Berlin Heidelberg: Springer-Verlag, 2003: 317-325. [16] Abell, er al. An Experimental Study about Simple Decision Trees for Bagging Ensemble on Datasets with Classification Noise[C]//Proceedings of the 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty. Berlin Heidelberg: Springer-Verlag, 2009: 446-456. [17] Tan P.N., Steinbach M., Kumar V. Introduction to Data Mining [M]. Addison-Wesley, 2005: 500. [18] Kullback S., Leibler R.A.. On information and sufficiency [J]. Annals of mathematical statistics, 1951, 22(1): 79-86.