文档排序一直是信息检索(IR)领域的关键任务之一。受益于马尔科夫决策过程强大的建模能力,以及强化学习方法强大的求解能力,近年来基于强化学习的排序模型被提出并取得了良好效果。然而,由于候选文档中会包含大量的不相关文档,导致基于“试错”的强化学习方法存在效率低下的问题。为解决上述问题,该文提出了一种基于模仿学习的排序学习算法IR-DAGGER,其基于文档标注信息构建专家策略,在保证文档排序精度的同时提高了算法的学习效率。为了测试IR-DAGGER的性能,该文基于面向相关性排序任务的OHSUMED数据集和面向多样化排序的TREC数据集进行了实验,实验结果表明IR-DAGGER在上述两个数据集上均提升了文档排序的精度和效率。
Abstract
Document ranking is one of the central tasks in a number of IR applications. In recent years, efforts have been made to apply reinforcement learning for learning document ranking models and a number of methods have been developed. Though preliminary success has been achieved, existing reinforcement methods still suffer from the sparseness of the relevant documents. In this paper, we propose to involve ground-truth ranking lists during the learning process, achieving a novel imitation learning-based learning to rank algorithm called IR-DAGGER. It utilizes the ranking lists sampled by the expert policy, which can enhance the learning efficiency while keeping the ranking accuracies. Experimental results based on OHSUMED and TREC showed that IR-DAGGER can outperform the state-of-the-art baselines for the tasks of relevant ranking and diverse ranking, indicating the effectiveness and efficiency of imitation learning in document ranking.
关键词
排序 /
模仿学习 /
强化学习
{{custom_keyword}} /
Key words
learning to rank /
imitation learning /
reinforcement learning
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1]Liu T Y. Learning to rank for information retrieval[J]. Foundations and Trends in Information Retrieval, 2009, 3(3): 225-331.
[2]Derhami V, Khodadadian E, Ghasemzadeh M, et al. Applying reinforcement learning for web pages ranking algorithms[J]. Applied Soft Computing, 2013, 13(4): 1686-1692.
[3]Hofmann K, Whiteson S, De Rijke M. Balancing exploration and exploitation in learning to rank online[C]//Proceedings of the European Conference on Information Retrieval. Springer, Berlin, Heidelberg, 2011: 251-263.
[4]Howard R A. Dynamic programming and Markov processes[J]. The Mathmatical Gazette, 1972,46(358): 120.
[5]Zeng W, Xu J, Lan Y, et al. Reinforcement learning to rank with Markov decision process[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2017: 945-948.
[6]Xia L, Xu J, Lan Y, et al. Adapting Markov decision process for search result diversification[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2017: 535-544.
[7]Zeng W, Xu J, Lan Y, et al. Multi Page Search with Reinforcement Learning to Rank[C]//Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval. ACM, 2018: 175-178.
[8]Luo J, Zhang S, Yang H. Win-win search: Dual-agent stochastic game in session search[C]//Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 2014: 587-596.
[9]Sutton R S, Barto A G. Reinforcement learning: An introduction[M]. MIT press, 2018.
[10]Fan D P. Information processing analysis system for sorting and scoring text[P]. US: Patent 5,371,673. 1994-12-6.
[11]Fagin R, Kumar R, Sivakumar D. Comparing top k lists[J]. SIAM Journal on Discrete Mathematics, 2003, 17(1): 134-160.
[12]Ilyas I F, Beskales G, Soliman M A. A survey of top-k query processing techniques in relational database systems[J]. ACM Computing Surveys (CSUR), 2008, 40(4): 11.
[13]J?rvelin K, Kek?l?inen J. Cumulated gain-based evaluation of IR techniques[J]. ACM Transactions on Information Systems (TOIS), 2002, 20(4): 422-446.
[14]Sutton R S, McAllester D A, Singh S P, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Proceedings of Advances in Neural Information Processing Systems, 2000: 1057-1063.
[15]Williams R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8(3-4): 229-256.
[16]Ross S, Gordon G, Bagnell D. A reduction of imitation learning and structured prediction to no-regret online learning[C]//Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, 2011: 627-635.
[17]Ross S, Bagnell J A. Reinforcement and imitation learning via interactive no-regret learning[J]. arXiv preprint arXiv: 1406.5979, 2014.
[18]Hersh W, Buckley C, Leone T J, et al. OHSUMED: An interactive retrieval evaluation and new large test collection for research[C]//Proceedings of the SIGIR'94. Springer, London, 1994: 192-201.
[19]Burges C, Shaked T, Renshaw E, et al. Learning to rank using gradient descent[C]//Proceedings of the 22nd International Conference on Machine Learning (ICML-05), 2005: 89-96.
[20]Freund Y, Iyer R, Schapire R E, et al. An efficient boosting algorithm for combining preferences[J]. Journal of Machine Learning Research, 2003, 4(Nov): 933-969.
[21]Cao Z, Qin T, Liu T Y, et al. Learning to rank: From pairwise approach to listwise approach[C]//Proceedings of the 24th International Conference on Machine Learning. ACM, 2007: 129-136.
[22]Yue Y, Finley T, Radlinski F, et al. A support vector method for optimizing average precision[C]//Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2007: 271-278.
[23]Xu J, Li H. AdaRank: A boosting algorithm for information retrieval[C]//Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2007: 391-398.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61872338,61832017,61773362,61425016,61472401,61722211,61906180);北京高校卓越青年科学家计划项目(BJJWZYJH012019100020098);北京智源人工智能研究院(BAAI2019ZD0305);中国人民大学科学研究基金(2018030246);中国科学院青年创新促进会优秀会员项目(20144310,2016102);国家重点研发项目(2016QY02D0405)
{{custom_fund}}