传统的问答系统(QA)只是直接返回问题的答案,而且没有用户交互特性,而基于社区的问答系统(CQA),含有大量的“问答对”可以利用。该文提出了一种基于LDA的匹配框架来解决相似问句的匹配问题,分别从问句的统计信息、语义信息和主题信息三个方面来计算问句相似度,综合得到整体相似度。实验是在Yahoo! Answers上抽取的真实标注数据集上进行,最终的实验结果表明,该文的方法达到了很好的性能。
Abstract
While the traditional question answering (QA) systems just find the answer to the question directly without user interaction, the community-based QA systems (CQA) employs large available QA archives. The paper proposes a new retrieval framework based on LDA topics to find the similar questions according to the statistical, the semantic and the theme information. The experiments on the question-answer threads of the Yahoo! Answers show that our method achieved a good performance.
Key wordsquestions similarity; LDA theme model; community question answer; similarity calculation
关键词
问句相似度 /
LDA主题模型 /
社区问答 /
相似度计算
{{custom_keyword}} /
Key words
questions similarity /
LDA theme model /
community question answer /
similarity calculation
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] A. Berger, R. Caruana, D.Cohn, et al. Bridging the Lexical Chasm: Statistical Approaches to Answer-Finding[C]//Proceedings of SIGIR, New York, NY, USA, 2000: 192-199.
[2] Song Wanpeng, Feng Min, Gu Naijie, et al. Question Similarity Calculation for FAQ Answering[C]//Proceedings of SKG, 2007: 298-301.
[3] D. Molla, J. Vicedo. Question answering in restricted domains: An overview[J]. Computational Linguistics, 2007, 33(1):41-61.
[4] J. Jeon, W. B. Croft, J. H. Lee, et al. A framework to predict the quality of answers with non-textual features[C]//Proceedings of SIGIR, Seattle, USA, 2006: 228-235.
[5] M. Blooma, A. Chua, D. Goh. A predictive framework for retrieving the best answer[C]//Proceedings of SAC, Brazil, 2008: 1107-1111.
[6] Cao Xin, Cong Gao, Cui Bin, et al. A Generalized Framework of Exploring Category Information for Question Retrieval in Community Question Answer Archives[C]//Proceedings of WWW, Raleigh, New York, NY, USA. 2010: 201-210.
[7] J. Ko, L. Si, E. Nyberg. A probabilistic framework for answer selection in question answering[C]// Proceedings of NAACL/HLT, Rochester, NY, 2007: 524-531.
[8] Wang Xinjing, Tu Xudong, et al. Ranking community answers by modeling question- answer relationships via analogical reasoning[C]//Proceedings of SIGIR, New York, NY, USA. 2009: 179-186.
[9] P. Jurczyk, E. Agichtein. Discovering authorities in question answer communities by using link analysis [C]// Proceedings of CIKM, New York, NY, USA, 2007: 919-922.
[10] Shen Jie, Shen Wen, Fan Xin. Recommending Experts in Q&A Communities by Weighted HITS Algorithm[C]//Proceedings of IFITA, 2009: 151-154.
[11] J. Zhang, M. Ackerman, L. Adamic. Expertise networks in online communities: Structure and algorithms[C]//Proceedings of WWW, New York, NY, USA, 2007: 221-230.
[12] Liu Yandong, Bian Jiang, E. Agichtein. Predicting Information Seeker Satisfaction in Community Question Answering[C]//Proceedings of SIGIR, New York, NY, USA. 2008: 483-490.
[13] M. Blei, A. Ng, M. Jordan. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, b: 993-1022.
[14] T. L. Griffiths, M. Steyvers. Finding scientific topics[C]//Proceeding of the National Academy of Sciences. 2004: 5228-5235.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60673039,60973068);国家社科基金资助项目(08BTQ025);国家863高科技计划资助项目(2006AA01Z151);教育部留学回国人员科研启动基金和高等学校博士学科点专项科研基金资助课题(20090041110002)
{{custom_fund}}