为了满足用户对信息检索结果准确不断提高的需求,尽可能应用那些与查询及检索结果有关的信息进行查询结果优化是一种有效的手段。查询扩展和结果重排就是利用附加信息进行检索结果优化的方法。该文提出了基于文档团的文档重排模型(DCRM模型),此模型通过对文档集的学习,构造文档与文档关系的Markov网络,提取出文档Markov网络中的“文档团”,应用文档团信息进行文档重排。在adi、cacm、med、cisi和cran五个数据集上的实验结果表明,本文提出的基于文档团的文档重排模型较BM25模型性能得到有效提高。
Abstract
Document re-ranking is an effective measure to meet the user’s demand on high-precision information retrieval. This paper presents a document re-ranking model based on document clique, which is extracted from the document Markov network constructed form corpus. The incorporation of the document clique information into document re-ranking is proved valid with better precision than the BM25 model over adi, cacm, med, cisi and cran datasets.
关键词
计算机应用 /
中文信息处理 /
Markov网络 /
文档团 /
文档重排
{{custom_keyword}} /
Key words
wordscomputer application /
Chinese information processing /
Markov network /
document clique /
document re-ranking
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Fox, E. A., Nunn, G. L.,&Lee, W. C. Coefficients for Combining Classes in a Collection[C]//Proceedings of the 11th Annual International ACM Conference on Research and Development in Information Retrieval: 291-307.
[2] Qiu Y, Frei H. Concept based query expansion[C]// Korfhage R, Rasmussen EM,Willett P, eds Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM Press, 1993:160-169.
[3] S.Deerwester, S.T.Dumais, T.K.Landauer.Indexing by Latent Semantic Analysis[J].Journal of the Society for Information Science,1990, 41(6): 391-407.
[4] Tingting HE, Xinhui TU, Guozhong QU, Donghong JI. Chinese Query Expansion Based on Related Term Group[C]//IEEE International Conference on Natural Language Processing and Knowledge Engineering. 2005: 483-487.
[5] 丁国栋,白硕,王斌.一种基于局部共现的查询扩展方法[J].中文信息学报,2006,20(3): 84-91.
[6] Anick, P. G., Vaithyanathan. Exploiting Clustering and Phrases for Context-based Information Retrieval[C]//Proceedings of 20th ACM SIGIR International Conference on Research and Development in Information Retrieval: 314-323.
[7] Lingpeng Yang, Donghong Ji , Munkew Leong. Document Reranking by Term Distribution and Maximal Marginal Relevance for Chinese Information Retrieval[J]. Information Processing and Management: 2007,43: 315-326.
[8] de Campos L M,Ferna’ ndez-Luna J M,Huete J F.Implementing Relevance Feedback in the Bayesian Network Retrieval Model [J].Journal of the American Society for Information Science and Technology:2003, 54(4): 302-313.
[9] Silva I, Ribeiro-Neto B, Calado P, et al. Link-based and Content-based Evidential Information in A Belief Network Model[C]//Proceedings of the 23rd International ACM-SIGIR Conference on Research and Development in Information Retrieval . Athens, 2000: 96-103.
[10] 甘丽新.基于Markov概念的信息检索模型[D]. 江西师范大学计算机信息工程学院,2007.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60663007);江西省科技攻关项目(20062184);江西省教育厅科技项目(20072129);江西省自然科学基金资助项目(2007GZS2168)
{{custom_fund}}