Web搜索引擎中,对用户查询结构的有效分析,能更好地理解用户的查询意图,促进检索效果的提升。该文提出了一种简单高效的基于逐点互信息的查询结构分析方法,该方法包含了基于MapReduce的离线训练算法,以及一种自下向上的在线查询树构建算法。实验显示,该方法具有很高的切分速度,并能取得不错的可比较的切分效果。进一步的,该方法对检索性能的提升,也有明显的促进作用,在MAP,p@5,p@10评价指标上,都取得了不错的性能提升。
Abstract
The effective analysis of user query structure is helpful for understanding the users intent and promoting performance of the Web search engine. This paper proposes a straightforward and effective analysis method for user query structure based on PMI (pointwise mutual information). The method contains an off-line training algorithm based on MapReduce and a bottom-up online building method for query analysis. The experiment result shows that our approach possesses a high segmentation speed while maintain a comparable segmentation performance to other approaches. The experiment on TREC WT10g dataset further validates the effectiveness of our method and shows that it can prompt the search results in terms of MAP, p@5, p@10.
Key wordsquery structure analysis; MapReduce; online query analysis tree
关键词
查询结构分析 /
MapReduce /
在线查询树
{{custom_keyword}} /
Key words
query structure analysis /
MapReduce /
online query analysis tree
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] T. Tao, C. Zhai. An exploration of proximity measures in information retrieval[C]//Proceedings of SIGIR’07: 295-302.
[2] J. Bai, Y. Chang, H. Cui, et al. Investigation of partial query proximity in web search[C]//Proceedings of 17th International Conference on World Wide Web, 2008:1183-1184.
[3] Huang J., Gao J., Miao J., et al. Exploring web scale language models for search query processing[C]//Proceedings of WWW 2010.
[4] R. Jones, B. Rey, O. Madani, et al. Generating query substitutions[C]//Proceedings of 15th World Wide Web, 2006: 387-396.
[5] G. Kumaran, V. R. Carvalho. Reducing long queries using query quality predictors[C]//Proceedings of SIGIR’09, 2009: 564-571.
[6] D. Metzler, W. B. Croft. A markov random field model for term dependencies[C]//Proceedings of SIGIR’05, 2005: 472-479.
[7] K. M. Risvik, T. Mikolajewski, P. Boros. Query segmentation for Web search[C]//Proceedings of WWW 2003.
[8] S. Bergsma, Q. I. Wang. Learning noun phrase query segmentation[C]//Proceedings of EMNLP-CoNLL 2007: 819-826.
[9] B. Tan, F. Peng. Unsupervised query segmentation using generative language models and Wikipedia[C]//Proceedings of WWW 2008: 347-356.
[10] M. Hagen, M. Potthast, B. Stein, et al. The power of naive query segmentation[C]//Proceedings of SIGIR ’10, 2010: 797-798.
[11] Yanen Li, Bo-June (Paul) Hsu, ChengXiang Zhai, et al. Unsupervised Query Segmentation Using Clickthrough for Information Retrieval[C]//Proceedings of SIGIR’11, 2011: 285-294.
[12] G. Mishne, M. de Rijke. Boosting web retrieval through query operations[C]//Proceedings of ECIR, 2005: 502-516.
[13] 王思力,王斌. 基于双字耦合度的中文分词交叉歧义处理方法[J]. 中文信息学报,2007,21(5):14-18.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60903139,60873243,60933005);国家863计划重点项目(2010AA012502,2010AA012503)
{{custom_fund}}