一种基于排序学习方法的查询扩展技术

徐 博,林鸿飞,林 原,王 健

PDF(1974 KB)
PDF(1974 KB)
中文信息学报 ›› 2015, Vol. 29 ›› Issue (3) : 155-161.
信息检索与问答系统

一种基于排序学习方法的查询扩展技术

  • 徐 博,林鸿飞,林 原,王 健
作者信息 +

A Query Expansion Method Based on Learning to Rank

  • XU Bo, LIN Hongfei, LIN Yuan, WANG Jian
Author information +
History +

摘要

查询扩展作为一门重要的信息检索技术,是以用户查询为基础,通过一定策略在原始查询中加入一些相关的扩展词,从而使得查询能够更加准确地描述用户信息需求。排序学习方法利用机器学习的知识构造排序模型对数据进行排序,是当前机器学习与信息检索交叉领域的研究热点。该文尝试利用伪相关反馈技术,在查询扩展中引入排序学习算法,从文档集合中提取与扩展词相关的特征,训练针对于扩展词的排序模型,并利用排序模型对新查询的扩展词集合进行重新排序,将排序后的扩展词根据排序得分赋予相应的权重,加入到原始查询中进行二次检索,从而提高信息检索的准确率。在TREC数据集合上的实验结果表明,引入排序学习算法有助于提高伪相关反馈的检索性能。

Abstract

Query Expansion is an important technique for improving retrieval performance. It uses some strategies to add some relevant terms to the original query submitted by the user, which could express the user’s information need more exactly and completely. Learning to rank is a hot machine learning issue addressed in in information retrieval, seeking to automatically construct ranking models determining the relevance degrees between objects. This paper attempts to improve pseudo-relevance feedback by introducing learning to rank algorithm to re-rank expansion terms. Some term features are obtained from the original query terms and the expansion terms, learning from which we can get a new ranking list of expansion terms. Adding the expansion terms list to the original query, we can acquire more relevant documents and improve the rate of accuracy. Experimental results on the TREC dataset shows that incorporating ranking algorithms in query expansion can lead to better retrieval performance.

关键词

信息检索 / 查询扩展 / 伪相关反馈 / 排序学习

Key words

information retrieval / query expansion / pseudo-relevance feedback / learning to rank

引用本文

导出引用
徐 博,林鸿飞,林 原,王 健. 一种基于排序学习方法的查询扩展技术. 中文信息学报. 2015, 29(3): 155-161
XU Bo, LIN Hongfei, LIN Yuan, WANG Jian. A Query Expansion Method Based on Learning to Rank. Journal of Chinese Information Processing. 2015, 29(3): 155-161

参考文献

[1] G Cao, J Y Nie, S Robertson. Selecting good expansion terms for pseudo-relevance feedback[C]//Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, 2008: 243-250.
[2] L Matthew, A James, C Bruce. Regression Rank: Learning to Meet the Opportunity of Descriptive Queries[C]//Proceedings of the 32th European Conference on IR Research, Toulouse, France, 2009: 90-101.
[3] L Matthew. An Improved Markov Random Field Model for Supporting Verbose Queries[C]//Proceedings of SIGIR2009, American, 2009: 476-483.
[4] C J Lee, R C Chen, S H Kao and P J Cheng. A Term Dependency-Based Approach for Query Terms Ranking[C]//Proceedings of the 18th ACM Conference on Information and Knowledge Management, Hong Kong, China, 2009: 1267-1276.
[5] T Y Liu. Learning to Rank for Information Retrieval [J]. Foundations and Trends in Information Retrieval, 2009, 3(3): 225-331.
[6] T Qin, T Y Liu, J Xu, et al. LETOR: Benchmark Collection for Research on Learning to Rank for Information Retrieval[C]//Proceedings of SIGIR 2007 Workshop on Learning to Rank for Information Retrieval (LR4IR 2007), Amsterdam, The Netherlands, 2007: 3-10.
[7] Y Freund, R D Iyer, R E Schapire, et al. An Efficient Boosting Algorithm for Combining Preferences[J]. Journal of Machine Learning Research, 2003, 4:933-969.
[8] Y Cao, J Xu, T Liu, et al. Adapting ranking SVM to document retrieval[C]//Proceedings of SIGIR2006, Seattle, WA, USA, 2006:186-193.
[9] T Joachims. Optimizing search engines using clickthrough data[C]//Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, 2002: 133-142.
[10] U Ozertem, O Chapelle, P Donmez, et al. Learning to suggest: a machine learning framework for ranking query suggestions[C]//Proceedings of SIGIR2012, Portland, OR, USA, 2012: 25-34.
[11] R Jones, B Rey, O Madani, et al. Generating query substitutions[C]//Proceedings of WWW2006, Edinburgh, Scotland, UK, 2006: 387-396.
[12] Y Lin, H Lin, S Jin, et al. Social Annotation in Query Expansion: a Machine Learning Approach[C]//Proceedings of SIGIR2011, Beijing, China, 2011: 405-414.
[13] V Lavrenko, W B Croft. Relevance based language models[C]//Proceedings of SIGIR2001. New Orleans, Louisiana, 2001: 186-193
[14] J Ponte, W Croft. A language modeling approach to information retrieval[C]//Proceedings of SIGIR1998, Melbourne, Australia, 1998: 275-281.

基金

国家自然科学基金(61277370、61402075);国家863高科技计划(2006AA01Z151);辽宁省自然科学基金(201202031、2014020003)、教育部留学回国人员科研启动基金和高等学校博士学科点专项科研基金(20090041110002);中央高校基本科研业务费专项资金。
PDF(1974 KB)

512

Accesses

0

Citation

Detail

段落导航
相关文章

/