龚小龙,王明文,万剑怡,王晓庆. 结合邻近度的语义位置语言检索模型[J]. 中文信息学报, 2015, 29(4): 183-191.
GONG Xiaolong, WANG Mingwen, WAN Jianyi, WANG Xiaoqing. Semantic Positional Language Retrieval Models with A Proximity Information. , 2015, 29(4): 183-191.
结合邻近度的语义位置语言检索模型
龚小龙,王明文,万剑怡,王晓庆
江西师范大学 计算机信息工程学院,江西 南昌 330022
Semantic Positional Language Retrieval Models with A Proximity Information
GONG Xiaolong, WANG Mingwen, WAN Jianyi, WANG Xiaoqing
School of Computer Information Engineering, Jiangxi Normal University, Nanchang, Jiangxi 330022, China
Abstract:In most existing retrieval models, the calculations on the relevance between the document and the query are based on the statistical features, such as within-document frequencies, inverse document frequencies, document lengths and so on. Recent studies show that the term position information can promote the precision of the query results, but how to best employ this information remains an open issue. This paper proposes to integrate the terms proximity information into the semantic positional language model(SPLM), with a Dirichlet prior distribution as smoothing measure to compute proximity. The proposed semantic positional language retrieval models with a proximity information performs better than classical semantic positional language model in the experiments.
[1]Ponte J M, Croft W B. A language modeling approach to information retrieval [C]//Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval,Melbourne,Austrailia: ACM,1998: 275-281.
[2] Yuanhua Lv,Chengxiang Zhai. Positional language models for information retrieval [C]//Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval,Boston: ACM, 2009: 299-306.
[3] 余伟,王明文,万剑怡,等. 结合语义的位置语言模型[J].北京大学学报(自然科学版),2013,49(2): 203-212.
[4] Beeferman D,Berger A, Lafferty J. A model of lexical attraction and repulsion [C]//Proceedings of the 8th Conference on European Chapter of the Association for Computational Linguistics,1997: 373-380.
[5] Bai J,Chang Y,Cui H,et al. Investigation of partial query proximity in web search [C]//Proceedings of the 21st Annual Conference on World Wide Web,Beijing,China: 2008: 1183-1184.
[6] Tao T, Zhai C. An exploration of proximity measures in information retrieval [C]//Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval,Amsterdam,Netherlands: ACM,2007: 295-302.
[7] Y Rasolofo, J Savoy. Term Proximity Scoring for Keyword-Based Retrieval Systems [C]//Lecture Notes in Computer Science,2003: 207-218.
[8] E Michael Keen. The use of term position devices in ranked output experiments [J]. The Journal of Documentation,1991,(47): 1-22.
[9] E Michael Keen. Some aspects of proximity searching in text retrieval systems [J]. Journal of Information Science,1992,(18): 89-98.
[10] Stefan Buttcher, Charles L A Clarke, Brad Lushman. Term proximity scoring for ad-hoc retrieval on very large text collections [C]//Proceedings of the 29th annual international ACM SIGIR conference,New York,USA: ACM,2006: 621-622.
[11] 韩中元,李生,齐浩亮,等. 面向信息检索的近邻语言模型[J]. 中文信息学报,2011,25(1): 67-70.
[12] 丁凡,王斌,白硕,等. 文档检索中句法信息的有效利用研究[J]. 中文信息学报,2008,22(4): 66-74.
[13] 金凌,吴文虎,郑方,等. 距离加权统计语言模型及其应用[J]. 中文信息学报,2001,15(6): 47-52.
[14] 乔亚男,刘跃虎,齐勇. 查询词相似度加权的邻近性检索方法[J].模式识别与人工智能,2013,26(2): 191-194.
[15] Jinlei Zhao, Yeogirl Yun. A Proximity Language Model for information Retrieval[C]//Proceedings of the 32nd international ACM SIGIR conference, Boston, USA: ACM,2009: 291-298.
[16] Zhai C, Lafferty J. A study of smoothing methods for language models applied to ad hoc information retrieval [C]//Proceedings of the 24th annual international ACM SIGIR conference,New Orleans,Louisiana,USA: ACM,2001: 334-342.
[17] Zhai C, Lafferty J. Two-stage language models for information retrieval [C]//Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval,Tampere,Finland: ACM,2002: 49-56.
[18] Yuanhua Lv,Chengxiang Zhai. Positional Relevance Model for Pseudo-Relevance Feedback [C]//Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval,Geneva,Switzerland: ACM,2010: 579-586.
[19] Krysta M Svore,Pallika H Kanani,Nazan Khan. How Good is a Span of Terms Exploiting Proximity to Improve Web Retrieval [C]//Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval,Geneva,Switzerland: ACM,2010: 154-161.