基于伪文档的伪相关反馈方法

闫 蓉;高光来

PDF(2728 KB)
PDF(2728 KB)
中文信息学报 ›› 2016, Vol. 30 ›› Issue (6) : 156-163.
综述

基于伪文档的伪相关反馈方法

  • 闫 蓉;高光来
作者信息 +

A New Pseudo Relevance Feedback Based on Pseudo Document

  • YAN Rong; GAO Guanglai
Author information +
History +

摘要

传统的伪相关反馈(Pseudo Relevance Feedback, PRF)方法通常是以文档作为扩展源单元提取扩展词,提取粒度过大造成扩展源质量下降,使得检索结果鲁棒性差。该文研究利用主题分析技术,尝试将文本语义内容作为扩展源单元,缓解扩展源质量不高的问题。提出并实现了对文本集中各文档内容的伪文档描述,通过对其进行隐式多样化处理,实现了从更细微的文本内容角度出发提取扩展词。通过在真实NTCIR8中文语料的检索结果表明,该方法可以有效地提升伪相关反馈的检索性能。

Abstract

The classical Pseudo Relevance Feedback (PRF) usually chooses the document as the unit, which would decrease the quality of expansion due to the larger extraction unit. Applying the topic analysis techniques, this paper proposes to use the semantic content of text as the expansion unit. Based on the proposed pseudo document description of each document in collection, the expansion terms are decided by using implicit diversification on the more subtle document content level. The experimental results on real NTCIR8 dataset show an clear improvement in terms of PRF performance.

关键词

伪相关反馈 / 伪文档 / 主题分析 / 隐含主题

Key words

Pseudo Relevance Feedback (PRF) / pseudo document / topic analysis / latent topic
 
/   /   /
 
/   /   /
 
/   /  

引用本文

导出引用
闫 蓉;高光来. 基于伪文档的伪相关反馈方法. 中文信息学报. 2016, 30(6): 156-163
YAN Rong; GAO Guanglai. A New Pseudo Relevance Feedback Based on Pseudo Document. Journal of Chinese Information Processing. 2016, 30(6): 156-163

参考文献

[1] Carpineto C, Romano G.A survey of automatic query expansion in information retrieval[J]. ACM Computing Surveys (CSUR), 2012, 44(1): 1-56.
[2] Metzler D, Croft W B. Latent concept expansion using markov random fields[C]//Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. Amsterdam, the Netherlands. 2007: 311-318.
[3] Lee K S, Croft W B, Allan J.A cluster-based resampling method for pseudo-relevance feedback[C]//Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. Singapore. 2008: 235-242.
[4] Mei Q, Zhang D, Zhai C. A general optimization framework for smoothing language models on graph structures[C]//Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. Singapore. 2008:611-618.
[5] Huang Y, Sun L, Nie J Y. Query model refinement using word graphs[C]//Proceedings of the 19th ACM conference on Information and knowledge management. Toronto, Ontario, Canada. 2010:1453-1456.
[6] Parapar J, Presedo-Quindimil M A, lvaro Barreiro.Score distributions for Pseudo Relevance Feedback[J]. Information Sciences, 2014, 273(8): 171-181.
[7] 徐博,林鸿飞,林原等. 一种基于排序学习方法的查询扩展技术[J]. 中文信息学报, 2015, 29(3):155-161.
[8] Vargas S, Santos R L T, Macdonald C, et al.Selecting effective expansion terms for diversity[C]//Proceedings of the 10th conference on Open research areas in information retrieval. Lisbon, Portugal. 2013: 69-76.
[9] Clough P, Sanderson M, Abouammoh M, et al.Multiple approaches to analysing query diversity[C]//Proceedings of the 32nd annual international ACM SIGIR conference on Research and development in information retrieval. Boston, USA. 2009: 734-735.
[10] Clarke C L A, Kolla M, Cormack G V, et al. Novelty and diversity in information retrieval evaluation[C]//Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. Singapore. 2008: 659-666.
[11] Teevan J, Dumais S T, Horvitz E. Characterizing the value of personalizing search[C]//Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. Amsterdam, the Netherlands. 2007: 757-758.
[12] Blei D M, Lafferty J. Text Mining: Theory and Applications[M]. Chapter Topic Models, Taylor and Francis, London, 2009.
[13] Wei F, Liu S, Song Y, et al.Tiara: a visual exploratory text analytic system[C]//Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. Washington. 2010: 153-162.
[14] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of machinelearning research, 2003, 3: 993-1022.
[15] Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summarics[C]//Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. Melbourne. 1998: 335-336.
[16] Griffiths T L, Steyvers M.Finding scientific topics[J].Proceedings of the National Academy of Sciences of the United States of America (PNAS), 2004, 101(z1): 5228-5235.
[17] Wei X, Croft W B. LDA-based document models for ad-hoc retrieval[C]//Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Seattle. 2006: 178-185.
[18] Ogilvie P, Voorhees E, Callan J. On the number of terms used in automatic query expansion[J]. Information Retrieval, 2009, 12(6): 666-679.[19] Jones K S, Walker S, Robertson S E. A probabilistic model of information retrieval: development and comparative experiments: Part 1[J]. Information Processing & Management, 2000, 36(6): 779-808.
[20] Roelleke T, Wang J. TF-IDF uncovered: a study of theories and probabilities[C]//Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. Singapore. 2008: 435-442.
[21] Karimzadehgan M, Zhai C.A learning approach to optimizing exploration-exploitation tradeoff in relevance feedback[J]. Information Retrieval, 2013, 16(3): 307-330.

基金

国家自然科学基金(61263037, 61662053);内蒙古自然科学基金(2014BS0604)
PDF(2728 KB)

911

Accesses

0

Citation

Detail

段落导航
相关文章

/