结合邻近度的语义位置语言检索模型

龚小龙,王明文,万剑怡,王晓庆

PDF(2139 KB)
PDF(2139 KB)
中文信息学报 ›› 2015, Vol. 29 ›› Issue (4) : 183-191.
信息检索与问答系统

结合邻近度的语义位置语言检索模型

  • 龚小龙,王明文,万剑怡,王晓庆
作者信息 +

Semantic Positional Language Retrieval Models with A Proximity Information

  • GONG Xiaolong, WANG Mingwen, WAN Jianyi, WANG Xiaoqing
Author information +
History +

摘要

在传统的检索模型中,文档与查询的匹配计算主要考虑词项的统计特征,如词频、逆文档频率和文档长度,近年来的研究表明应用查询词项匹配在文档中的位置信息可以提高查询结果的准确性。如何更好地刻画查询词在文档中的位置信息并建模,是研究提高检索效果的问题之一。该文在结合语义的位置语言模型(SPLM)的基础上进一步考虑了词的邻近信息,并给出了用狄利克雷先验分布来计算邻近度的平滑策略,提出了结合邻近度的位置语言检索模型。在标准数据上的实验结果表明,提出的检索模型在性能上要优于结合语义的位置语言模型。

Abstract

In most existing retrieval models, the calculations on the relevance between the document and the query are based on the statistical features, such as within-document frequencies, inverse document frequencies, document lengths and so on. Recent studies show that the term position information can promote the precision of the query results, but how to best employ this information remains an open issue. This paper proposes to integrate the terms proximity information into the semantic positional language model(SPLM), with a Dirichlet prior distribution as smoothing measure to compute proximity. The proposed semantic positional language retrieval models with a proximity information performs better than classical semantic positional language model in the experiments.

关键词

语义位置语言模型 / Dirichlet平滑 / 邻近度信息 / 检索模型

Key words

semantic positional language models / Dirichlet smooth / proximity information / retrieval model

引用本文

导出引用
龚小龙,王明文,万剑怡,王晓庆. 结合邻近度的语义位置语言检索模型. 中文信息学报. 2015, 29(4): 183-191
GONG Xiaolong, WANG Mingwen, WAN Jianyi, WANG Xiaoqing. Semantic Positional Language Retrieval Models with A Proximity Information. Journal of Chinese Information Processing. 2015, 29(4): 183-191

参考文献

[1]Ponte J M, Croft W B. A language modeling approach to information retrieval [C]//Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval,Melbourne,Austrailia: ACM,1998: 275-281.
[2] Yuanhua Lv,Chengxiang Zhai. Positional language models for information retrieval [C]//Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval,Boston: ACM, 2009: 299-306.
[3] 余伟,王明文,万剑怡,等. 结合语义的位置语言模型[J].北京大学学报(自然科学版),2013,49(2): 203-212.
[4] Beeferman D,Berger A, Lafferty J. A model of lexical attraction and repulsion [C]//Proceedings of the 8th Conference on European Chapter of the Association for Computational Linguistics,1997: 373-380.
[5] Bai J,Chang Y,Cui H,et al. Investigation of partial query proximity in web search [C]//Proceedings of the 21st Annual Conference on World Wide Web,Beijing,China: 2008: 1183-1184.
[6] Tao T, Zhai C. An exploration of proximity measures in information retrieval [C]//Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval,Amsterdam,Netherlands: ACM,2007: 295-302.
[7] Y Rasolofo, J Savoy. Term Proximity Scoring for Keyword-Based Retrieval Systems [C]//Lecture Notes in Computer Science,2003: 207-218.
[8] E Michael Keen. The use of term position devices in ranked output experiments [J]. The Journal of Documentation,1991,(47): 1-22.
[9] E Michael Keen. Some aspects of proximity searching in text retrieval systems [J]. Journal of Information Science,1992,(18): 89-98.
[10] Stefan Buttcher, Charles L A Clarke, Brad Lushman. Term proximity scoring for ad-hoc retrieval on very large text collections [C]//Proceedings of the 29th annual international ACM SIGIR conference,New York,USA: ACM,2006: 621-622.
[11] 韩中元,李生,齐浩亮,等. 面向信息检索的近邻语言模型[J]. 中文信息学报,2011,25(1): 67-70.
[12] 丁凡,王斌,白硕,等. 文档检索中句法信息的有效利用研究[J]. 中文信息学报,2008,22(4): 66-74.
[13] 金凌,吴文虎,郑方,等. 距离加权统计语言模型及其应用[J]. 中文信息学报,2001,15(6): 47-52.
[14] 乔亚男,刘跃虎,齐勇. 查询词相似度加权的邻近性检索方法[J].模式识别与人工智能,2013,26(2): 191-194.
[15] Jinlei Zhao, Yeogirl Yun. A Proximity Language Model for information Retrieval[C]//Proceedings of the 32nd international ACM SIGIR conference, Boston, USA: ACM,2009: 291-298.
[16] Zhai C, Lafferty J. A study of smoothing methods for language models applied to ad hoc information retrieval [C]//Proceedings of the 24th annual international ACM SIGIR conference,New Orleans,Louisiana,USA: ACM,2001: 334-342.
[17] Zhai C, Lafferty J. Two-stage language models for information retrieval [C]//Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval,Tampere,Finland: ACM,2002: 49-56.
[18] Yuanhua Lv,Chengxiang Zhai. Positional Relevance Model for Pseudo-Relevance Feedback [C]//Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval,Geneva,Switzerland: ACM,2010: 579-586.
[19] Krysta M Svore,Pallika H Kanani,Nazan Khan. How Good is a Span of Terms Exploiting Proximity to Improve Web Retrieval [C]//Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval,Geneva,Switzerland: ACM,2010: 154-161.

基金

国家自然科学基金(60963014,61163006,61203313);江西省科技厅自然科学基金(20132BAB201038)
PDF(2139 KB)

537

Accesses

0

Citation

Detail

段落导航
相关文章

/