一种基于类别先验信息的问题检索语言模型

PDF(1606 KB)

中文信息学报 ›› 2014, Vol. 28 ›› Issue (4) : 98-103.

信息检索及社会计算

一种基于类别先验信息的问题检索语言模型

吉宗诚^1,2,王斌¹

作者信息 +

A Language Model Based on Category Prior for Question Retrieval

JI Zongcheng^1,2, WANG Bin¹

Author information +

History +

摘要

社区问答系统已经积累了大量的以层次类别结构进行组织的问题答案对。为了能够重用这些非常宝贵的历史问题答案对资源,设计出一个非常有效的问题检索模型至关重要。在该文中,我们在语言模型建模的框架下提出了一种新的基于问题类别先验信息的方法来提高相似问题检索的性能。特别地,我们将叶子类别语言模型看作是Dirichlet超参来对一元语言模型的参数进行加权,从而提出了一种新的基于类别先验信息的语言模型。该方法具有严格的数学推导依据。在来源于Yahoo! Answers的真实的大量数据集上做了实验比较和分析,实验结果表明我们提出的方法比之前简单的线性插值的方法具有非常显著的性能提升。

Abstract

Community Question Answering (CQA) services have been building up large archives of question-answer pairs, which are organized into a hierarchy of categories. To reuse the invaluable historical question-answer pairs, it is essential to develop effective Question Retrieval (QR) models. In this paper, we propose a novel approach based on category prior of questions within the language modeling framework for improving the QR performance. Specifically, a new Language Model based on category prior is proposed which views the Leaf Category Language Model as the Dirichlet hyper-parameter that weights the parameters of the unigram Language Model. The approach has solid mathematic foundation. Experiments conducted on a large scale real world CQA dataset from Yahoo! Answers show that our proposed method can significantly outperform the previous work which just combines the category information with the unigram Language Model linearly.

导出引用

吉宗诚,王斌. 一种基于类别先验信息的问题检索语言模型. 中文信息学报. 2014, 28(4): 98-103

JI Zongcheng, WANG Bin. A Language Model Based on Category Prior for Question Retrieval. Journal of Chinese Information Processing. 2014, 28(4): 98-103

参考文献

[1] Li Cai, Guangyou Zhou, Kang Liu, et al. Learning the latent topics for question retrieval in community qa[C]//Proceedings of the IJCNLP, Chiang Mai, Thailand: Asian Federation of Natural Language Processing, 2011: 273-281.
[2] Xin Cao, Gao Cong, Bin Cui, et al. The use of categorization information in language models for question retrieval [C]//Proceedings of the CIKM, Hong Kong, China: ACM, 2009: 265-274.
[3] Xin Cao, Gao Cong, Bin Cui, et al. A generalized framework of exploring category information for question retrieval in community question answer archives [C]//Proceedings of the WWW, Raleigh, North Carolina, USA: ACM, 2010: 201-210.
[4] Zhao-Yan Ming, Tat-Seng Chua, Gao Cong. Exploring domain-specific term weight in archived question search[C]//Proceedings of the CIKM, Toronto, ON, Canada: ACM, 2010: 1605-1608.
[5] Jiwoon Jeon, W Bruce Croft, Joon Ho Lee. Finding similar questions in large question and answer archives[C]//Proceedings of the CIKM, Bremen, Germany: ACM, 2005: 84-90.
[6] Jung-Tae Lee, Sang-Bum Kim, Young-In Song,et al. Bridging lexical gaps between queries and questions on large online q&a collections with compact translation models[C]//Proceedings of the EMNLP, Honolulu, Hawaii: Association for Computational Linguistics, 2008: 410-418.
[7] Xiaobing Xue, Jiwoon Jeon, W Bruce Croft. Retrieval models for question and answer archives[C]//Proceedings of the SIGIR, Singapore, Singapore: ACM, 2008: 475-482.
[8] Guangyou Zhou, Li Cai, Jun Zhao, et al. Phrase-based translation model for question retrieval in community question answer archives[C]//Proceedings of the ACL-HLT, Portland, Oregon, USA: Association for Computational Linguistics, 2011: 653-662.
[9] Zongcheng Ji, Fei Xu, Bin Wang, et al. Question-answer topic model for question retrieval in community question answering[C]//Proceedings of the CIKM, Maui, Hawaii, USA: ACM, 2012: 2471-2474.[10] Kai Wang, Zhaoyan Ming, and Tat-Seng Chua. A syntactic tree matching approach to finding similar questions in community-based qa services[C]//Proceedings of the SIGIR, Boston, MA, USA: ACM, 2009: 187-194.
[11] Chengxiang Zhai and John Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval[C]//Proceedings of the SIGIR, New Orleans, Louisiana, United States: ACM, 2001: 334-342.
[12] John Lafferty and Chengxiang Zhai. Document language models, query models, and risk minimization for information retrieval[C]//Proceedings of the SIGIR, New Orleans, Louisiana, United States: ACM, 2001: 111-119.

基金

国家自然科学基金资助项目(61070111);科学院先导资助项目(XDA06030200)

PDF(1606 KB)

609

Accesses

Citation

Detail

段落导航

摘要
Abstract
关键词
Key words
引用本文
参考文献
基金

Received	Published
2013-01-10	2014-04-10
Issue Date
2014-04-10

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注

基金