Web信息检索结果融合中的按位加权插入合并算法

张敏,金奕江,马少平

PDF(312 KB)
PDF(312 KB)
中文信息学报 ›› 2004, Vol. 18 ›› Issue (2) : 9-15.

Web信息检索结果融合中的按位加权插入合并算法

  • 张敏,金奕江,马少平
作者信息 +

Rank-based Weighted Insertion Results Fusion Algorithm in Web IR

  • ZHANG Min,JIN Yi-jiang,MA Shao-ping
Author information +
History +

摘要

在Internet中,由于海量数据的多样性,在分布式数据集合上进行有效的检索就成为Web信息检索的一种必要方式。由此,引出多个检索结果的融合问题。对不同检索结果的相似度评分可能完全不可比的情况,本文给出一种新的解决方案:按位加权插入合并算法。在18GB的大规模web标准测试集上的实验证明,该算法始终能够提高综合检索性能,且分布数据集检索结果越好,则合并后性能改善越多。其中系统平均精度提高接近10%,突破了传统方法对分布数据集结果合并的综合效果总是低于使用集中数据集检索的性能局限。

Abstract

According to the variety of huge amount of web pages in Internet , it has been necessary to today's Web IR to search effectively on distributed collections. Therefore , the retrieval results fusion problem is derived. In this paper , a novel rank-based weighted insertion results fusion algorithm is proposed. Though it is possible that similarity scores of different results are absolutely incomparable , the proposed algorithm works effectively. Experimental results on 18GB large-scale Web standard test collection show the weighted insertion result fusion strategy enhances retrieval performance consistently. When the performances of distributed results are improved , the enhancement increases as well , which reaches to 10%. Furthermore , it also breaks the limitation in traditional result fusion studies that the final result merged by distributed collections is always worse than that of using single central database.

关键词

计算机应用 / 中文信息处理 / Web信息检索 / 数据集选择 / 结果融合 / 基于排序的融合

Key words

computer application / Chinese information processing / Web IR / collection selection / result fusion / rank-based fusion

引用本文

导出引用
张敏,金奕江,马少平. Web信息检索结果融合中的按位加权插入合并算法. 中文信息学报. 2004, 18(2): 9-15
ZHANG Min,JIN Yi-jiang,MA Shao-ping. Rank-based Weighted Insertion Results Fusion Algorithm in Web IR. Journal of Chinese Information Processing. 2004, 18(2): 9-15

参考文献

[1] E. M. Voorhees , D. Harman. Overview of the sixth Text Retrieval Conference [A] . In : Proceedings of TREC'6 [C] . NIST Special Publication 500 - 240 , 1997 , 1 - 24.
[2] E. M. Voorhees , N. K. Gupta , B. Johnson-Laird. Learning Collction Fusion Strategies [A] . In : Proceedings of the 18th Internatonal Conference on Research and Development in Information Retrieval (SIGIR'95) [C] , 1995 , 172 - 179.
[3] J. Savoy , J. Picard. Report on the TREC - 8 Experiment : Searching on the web and distributed collections [A] . In : Proceedings of TREC'8 [C] , 1999 , 229 - 241.
[4] J. Xu , J. P. Callan. Effective Retrieval with Distributed Collections [A]. In : Proceedings of the ACM-SIGIR'98 [C] , Melbourne (Australia) , 1998 , 112 - 120.
[5] E. A. Fox and Joseph A. Shaw. Combination of multiple searches [A] . In : Proceedings of the 2nd Text REtrieval Conference (TREC2) [C] , 1993 , 243 - 252.
[6] N. J. Belkin , P. Kantor , C. Cool and R. Quantrain. Query combination and data fusion for inforamtion retrieval [A] . In : Proceedings of the 2nd Text REtrieval Conference ( TREC2) [C] , 1994 , 35 - 44.
[7] B. T. Bartell , Garrisosn W. Cottrell , and Richard K. Belew. Automatic combination of multiple ranked retrieval systems [A] . In : Proceedings of the 17th International Conference on Research and Development in Information Retrieval (SIGIR'94) [C] , 1994 , 173 - 181.
[8] J. Savoy , M. Ndarugenamwo , D. Vrajitoru. Report on the TREC - 4 Experiment : Combing Probabilistic and Vector-Space Schemes [A] . Proceedings TREC'4 [ C ] , NIST publication 500 - 236 , Gaithersburg (MD) , 1996 , 537 - 547.
[9] J. P. Callan , Z. Lu ,W. B. Croft , Searching distributed collections with inference networks [A] . Proceedings of the ACM-SIGIR'95 [C] , 21 - 28.
[10] K. L. Kwok , L. Grunfeld , D. D. Lewis. TREC - 3 Ad-hoc , Routing Retrieval and thresholding experiments using PIRCS [A] . Proceedings of TREC3 [C] , NIST publication 500 - 236 , Gaithersburg (MD) , 1995 , 247 - 255.
[11] A. Moffat , R. Sacks-Davis , R. Wilkinson , and J. Zobel. Retrieval of partial documents [A] . D. Harman , editor , In : Proceedings of the Second Text REtrieval Conference ( TREC - 2) [C] . NIST Special Publication 500 - 215 , 1994.
[12] S. T. Dumais. Latent Semantic Indexing (LSI) and TREC2 [A] . In : Proceedings of the 2nd Text REtrieval Conference (TREC2) [C] , NIST Special Publication , 1993 , 105 - 115.
[13] S E Robertson and S Walker. Microsoft Cambridge at TREC - 9 : Filtering track [A] . In : Proceedings of TREC - 9 [C] . NIST Special Publication , 2000.
[14] 张敏,马少平. Web文本检索中信息的分布特性与检索策略研究[A]. 全国搜索引擎和网上信息挖掘研讨会论文集[C] ,2003.3.
[15] N. Craswell and D. Hawknig. Overview of the TREC - 2002 Web Track [A] . In : Proceeding of TREC - 2002 [C] , NIST Special Publication , 2002.

基金

国家重点基础研究(973)资助项目(G1998030509);自然科学基金资助项目(60223004);国家863高科技资助项目(2001AA114082)
PDF(312 KB)

611

Accesses

0

Citation

Detail

段落导航
相关文章

/