基于逐点互信息的查询结构分析

朱亚东1, 2,张 成1,俞晓明1,程学旗1

PDF(1399 KB)
PDF(1399 KB)
中文信息学报 ›› 2012, Vol. 26 ›› Issue (5) : 33-40.
综述

基于逐点互信息的查询结构分析

  • 朱亚东1, 2,张 成1,俞晓明1,程学旗1
作者信息 +

Query Structure Analysis Based on PMI

  • ZHU Yadong1, 2, ZHANG Cheng1, YU Xiaoming1, CHENG Xueqi1
Author information +
History +

摘要

Web搜索引擎中,对用户查询结构的有效分析,能更好地理解用户的查询意图,促进检索效果的提升。该文提出了一种简单高效的基于逐点互信息的查询结构分析方法,该方法包含了基于MapReduce的离线训练算法,以及一种自下向上的在线查询树构建算法。实验显示,该方法具有很高的切分速度,并能取得不错的可比较的切分效果。进一步的,该方法对检索性能的提升,也有明显的促进作用,在MAP,p@5,p@10评价指标上,都取得了不错的性能提升。

Abstract

The effective analysis of user query structure is helpful for understanding the users intent and promoting performance of the Web search engine. This paper proposes a straightforward and effective analysis method for user query structure based on PMI (pointwise mutual information). The method contains an off-line training algorithm based on MapReduce and a bottom-up online building method for query analysis. The experiment result shows that our approach possesses a high segmentation speed while maintain a comparable segmentation performance to other approaches. The experiment on TREC WT10g dataset further validates the effectiveness of our method and shows that it can prompt the search results in terms of MAP, p@5, p@10.
Key wordsquery structure analysis; MapReduce; online query analysis tree

关键词

查询结构分析 / MapReduce / 在线查询树

Key words

query structure analysis / MapReduce / online query analysis tree

引用本文

导出引用
朱亚东1, 2,张 成1,俞晓明1,程学旗1. 基于逐点互信息的查询结构分析. 中文信息学报. 2012, 26(5): 33-40
ZHU Yadong1, 2, ZHANG Cheng1, YU Xiaoming1, CHENG Xueqi1. Query Structure Analysis Based on PMI. Journal of Chinese Information Processing. 2012, 26(5): 33-40

参考文献

[1] T. Tao, C. Zhai. An exploration of proximity measures in information retrieval[C]//Proceedings of SIGIR’07: 295-302.
[2] J. Bai, Y. Chang, H. Cui, et al. Investigation of partial query proximity in web search[C]//Proceedings of 17th International Conference on World Wide Web, 2008:1183-1184.
[3] Huang J., Gao J., Miao J., et al. Exploring web scale language models for search query processing[C]//Proceedings of WWW 2010.
[4] R. Jones, B. Rey, O. Madani, et al. Generating query substitutions[C]//Proceedings of 15th World Wide Web, 2006: 387-396.
[5] G. Kumaran, V. R. Carvalho. Reducing long queries using query quality predictors[C]//Proceedings of SIGIR’09, 2009: 564-571.
[6] D. Metzler, W. B. Croft. A markov random field model for term dependencies[C]//Proceedings of SIGIR’05, 2005: 472-479.
[7] K. M. Risvik, T. Mikolajewski, P. Boros. Query segmentation for Web search[C]//Proceedings of WWW 2003.
[8] S. Bergsma, Q. I. Wang. Learning noun phrase query segmentation[C]//Proceedings of EMNLP-CoNLL 2007: 819-826.
[9] B. Tan, F. Peng. Unsupervised query segmentation using generative language models and Wikipedia[C]//Proceedings of WWW 2008: 347-356.
[10] M. Hagen, M. Potthast, B. Stein, et al. The power of naive query segmentation[C]//Proceedings of SIGIR ’10, 2010: 797-798.
[11] Yanen Li, Bo-June (Paul) Hsu, ChengXiang Zhai, et al. Unsupervised Query Segmentation Using Clickthrough for Information Retrieval[C]//Proceedings of SIGIR’11, 2011: 285-294.
[12] G. Mishne, M. de Rijke. Boosting web retrieval through query operations[C]//Proceedings of ECIR, 2005: 502-516.
[13] 王思力,王斌. 基于双字耦合度的中文分词交叉歧义处理方法[J]. 中文信息学报,2007,21(5):14-18.

基金

国家自然科学基金资助项目(60903139,60873243,60933005);国家863计划重点项目(2010AA012502,2010AA012503)
PDF(1399 KB)

510

Accesses

0

Citation

Detail

段落导航
相关文章

/