一种基于局部共现的查询扩展方法

丁国栋,白硕,王斌

PDF(337 KB)
PDF(337 KB)
中文信息学报 ›› 2006, Vol. 20 ›› Issue (3) : 86-93.

一种基于局部共现的查询扩展方法

  • 丁国栋1,白硕2,王斌1
作者信息 +

Local Co-occurrence Based Query Expansion for Information Retrieval

  • DING Guo-dong1,BAI Shuo2,WANG Bin1
Author information +
History +

摘要

针对信息检索中文档与查询之间的词不匹配问题,本文提出了一种基于局部共现的查询扩展方法LOCOOC。LOCOOC利用词项与所有查询词在局部文档集合中的共现程度来评估扩展词的质量,并整合了词项在语料集中的全局统计信息,使得选取的扩展词与初始查询所表征的主题或概念具有更好的相关性。实验结果表明:与未进行查询扩展时相比,采用LOCOOC方法进行扩展后,平均准确率提高40%以上;与传统的局部反馈方法以及局部上下文分析方法(LCA,Local Context Analysis)相比,LOCOOC不仅具有更优的检索性能,而且有着更好的鲁棒性。

Abstract

Techniques for automatic query expansion have been extensively studied in information retrieval research as a solution to the word mismatch problem between queries and documents. Using the idea of Local Context Analysis, in this paperwe proposed a novel expansion method, called LOCOOC, which utilized the local co-occurrence information in top-ranked documents and the global statistical information in the whole collection to select most appropriate expansion terms. Experimental results show that LOCOOC offers more effective and robust retrieval performances, compared with local feedback based or LCA based expansion method.

关键词

计算机应用 / 中文信息处理 / 信息检索 / 局部共现 / 查询扩展 / LOCOOC

Key words

computer application / Chinese information processing / information retrieval / local co-occurrence / query expansion / LOCOOC

引用本文

导出引用
丁国栋,白硕,王斌. 一种基于局部共现的查询扩展方法. 中文信息学报. 2006, 20(3): 86-93
DING Guo-dong,BAI Shuo,WANG Bin. Local Co-occurrence Based Query Expansion for Information Retrieval. Journal of Chinese Information Processing. 2006, 20(3): 86-93

参考文献

[1] Buckley C. , Singhal A. , Mitra M. , and Salton G. New retrieval approaches using SMART [A]. In: proceedings of the 4th Text Retrieval Conference (TREC - 4) [C] , D. K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 1995, 25 - 48.
[2] Xu J. X. and Croft W. B. Improving the Effectiveness of Information Retrieval with Local Context Analysis [J]. ACM Transactions on Information Systems, 2000, 18 (1) : 79 - 112.
[3] 张华平. 语言浅层分析与句子级新信息检测研究[D]. 北京: 中国科学院研究生院, 2005.
[4] van Rijsbergen, C. J. Information Retrieval (2nd ed.) [M]. Butterworths, London, UK, 1979.
[5] Sparck Jones K. Automatic Keyword Classification for Information Retrieval [M]. Butterworths, London, 1971.
[6] Deerwester S. , Dumai S. T. , Furnas G. W. , Landauer T. K. , and Harshman R. Indexing by latent semantic analysis [J]. J. Am. Soc. Inf. Sci. 1990, 41 (6) : 391 - 407.
[7] Qiu Y. and Frei H. P. Concept based query expansion [A]. In: proceedings of the 16th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR’93) [C] , 1993, 160 - 169.
[8] Jing Y. and Croft W. B. An association the saurus for information retrieval [A]. In: proceedings of the Intelligent Multimedia Information Retrieval Systems (RIAO’94) [C] , 1994, 146 - 160.
[9] Buckley C. , Salton G. , Alan J. , and Singhal A. Automatic query expansion using SMART [A]. In: proceedings of the 3rd Text Retrieval Conference (TREC - 3) [C] , 1995, 69 - 80.
[10] 杨哲. 提高信息检索性能的有效机制与算法研究[D]. 北京: 中国科学院研究生院, 2004.
[11] Rocchio J. Relevance feedback in information retrieval [A]. In: The Smart Retrieval System - Experiments in Automatic Document Processing [M] , 1971. 313 - 323.
[12] Voorhees, E. and Harman, D. Overview of the Sixth Text Retrieval Conference [A]. In: proceedings of the 6th Text Retrieval Conference (TREC - 6) [C] , 1998.
[13] Mitra M. , Singhal A. and Buckley C. Improving automatic query expansion [A]. In: proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98) [C] , 1998, 206 - 214.
[14] Lu A. , Ayoub M. , and Dong J. Ad hoc experiments using EUREKA [A]. In: proceedings of the 5th Text Retrieval Conference (TREC - 5) [C] , 1997, 229 - 240.
[15] Buckley, C. , Mitra, M. , Walz, J. , and Cardie, C. 1998. Using clustering and super concepts within SMART[A]. In: proceedings of the 6th Text Retrieval Conference (TREC - 6) [C] , E. Voorhees, Ed. 107 - 124. NIST Special Publication 500 - 240.
[16] Claudio Carpineto, Giovanni Romano and Vittorio Giannini. Improving Retrieval Feedback with Multiple Term-Ranking Function Combination [J]. ACM Transactions on Information Systems, 2002, 20 (3) : 259 - 290.
[17] Robertson S. E. , Walker S. , Jones G. J. F. , Hancock-Beaulieu and Gatford M. Okap i at TREC - 3 [A]. In: proceedings of the Third Text Retrieval Conference (TREC - 3) [C] , 1995. 109 - 126.

基金

国家973计划资助项目(2004CB318109)
PDF(337 KB)

Accesses

Citation

Detail

段落导航
相关文章

/