长尾查询搜索性能评价方法的研究

霍 帅,张 敏,刘奕群,马少平,金奕江,茹立云

PDF(1466 KB)
PDF(1466 KB)
中文信息学报 ›› 2014, Vol. 28 ›› Issue (3) : 75-80.
信息检索及社会计算

长尾查询搜索性能评价方法的研究

  • 霍 帅,张 敏,刘奕群,马少平,金奕江,茹立云
作者信息 +

Research on Long-tail Query Search Performance Evaluation

  • HUO Shuai, ZHANG Min, LIU Yiqun, MA Shaoping, JIN Yijiang, RU Liyun
Author information +
History +

摘要

各大搜索引擎公司都致力于准确而快速的帮助用户找到信息目标,搜索性能评价变得非常重要,而目前尚无对长尾查询性能评价的方法。该文通过分析长尾查询结果数据,提取了长尾查询三种类型特征,并对特征进行叠加分析。进一步地针对数据集的严重不平衡问题提出两种数据平衡方法。最后提出并改进了长尾查询评价方法。在真实搜索引擎结果数据集上的实验验证了所提出的评价方法取得一定的评价效果,其中对不相关文档的评价取得较高的准确率。

Abstract

Search engines are committed to helping people find target information accurately and quickly, hence the evaluation of search performance becomes more vital, This paper deals with the rare queries performance evaluation which is less touched. First, three types of features are extracted after analyses of rare queries characteristics. Second, correlation of the features is analyzed and different combinations of features are tested. Then, two data balancing approaches are raised to alleviate the serious imbalance of the data set. Finally the evaluation method for rare queries is put forward and then improved. The experimental results show that the proposed evaluation approach is effective, by which the identification of non-relevant results achieves encouraging precision.

关键词

长尾查询 / 搜索引擎性能评价 / 自动评价方法

Key words

tail query / search engine performance evaluation / automatic evaluation method

引用本文

导出引用
霍 帅,张 敏,刘奕群,马少平,金奕江,茹立云. 长尾查询搜索性能评价方法的研究. 中文信息学报. 2014, 28(3): 75-80
HUO Shuai, ZHANG Min, LIU Yiqun, MA Shaoping, JIN Yijiang, RU Liyun. Research on Long-tail Query Search Performance Evaluation. Journal of Chinese Information Processing. 2014, 28(3): 75-80

参考文献

[1] CNNIC. 2011中国互联网络信息中心年度报告[EB/OL]. http://www.cnnic.cn/gywm/ndbg/201204/P020120507358937384891.pdf.[2012-06-11].
[2] 刘奕群, 岑荣伟, 张敏等. 基于用户行为分析的搜索引擎自动性能评价[J]. 软件学报, 2007,19(11), 3023-3032.
[3] BioinfoChina. 幂律分布_百度百科[EB/OL]. http://baike.baidu.com/view/1730411.htm. [2012-06-11].
[4] Hubeiy. 齐普夫定律_百度百科[EB/OL]. http://baike.baidu.com/view/40606.htm.[2012-06-11].
[5] Boulton C. Microsoft Ignored the Long Tail in Search, Bing Boss Says[EB/OL]. http://www.eweek.com/c/a/Search-Engines/Microsoft-Ignored-the-Long-Tail-in-Search-Bing-Boss-Says-396023/.
[6] Cleverdon C W, Mills J, Keen E M. An inquiry in testing of information retrieval systems.(2 vols.)[J]. Cranfileld, UK: Aslib Cranfield Research Project, College of Aeronautics. 1966,22(2):126-139.
[7] Hassan A, Jones R, Klinkner K L. Beyond DCG: user behavior as a predictor of a successful search[C]//Proceedings of WSDM 10, New York, NY, USA, 2010. New York, NY, USA: ACM, 2010: 221-230.
[8] Guo Q, White R W, Dumais S T, et al. Predicting query performance using query, result, and user interaction features[C]//Proceedings of RIAO 10, Paris, France, France, 2010. Paris, France, France: Le centre de hautes etudes internationales dinformatique documentaire, 2010: 198-201.
[9] Al-Maskari A, Sanderson M. A review of factors influencing user satisfaction in information retrieval[J]. Journal of the American Society for Information Science and Technology. 2010, 61(5): 859-868.
[10] Broder A. A taxonomy of web search[J]. SIGIR Forum. 2002, 36(2): 3-10.
[11] Boldi P, Bonchi F, Castillo C, et al. From Dango to Japanese Cakes: Query Reformulation Models and Patterns[C]//Proceedings of WI-IAT 09,2009. 2009: 183-190.
[12] Granka L A, Joachims T, Gay G. Eye-tracking analysis of user behavior in WWW search[C]//Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, Sheffield, United Kingdom, 2004. Sheffield, United Kingdom, 2004: 478-479.
[13] Broder A, Ciccolo P, Gabrilovich E, et al. Online expansion of rare queries for sponsored search[C]//Proceedings of the 18th international conference on World wide web, Madrid, Spain, 2009. Madrid, Spain, 2009: 511-520.
[14] Song Y, He L. Optimal rare query suggestion with implicit user feedback[C]//Proceedings of the 19th international conference on World wide web, Raleigh, North Carolina, USA, 2010. Raleigh, North Carolina, USA, 2010: 901-910.
[15] Yao T, Zhang M, Liu Y, et al. Empirical Study on Rare Query Characteristics[C]//Proceedings of 2011 IEEE/WIC/ACM International Conference on Web Intelligence,2011: 7-14.
[16] Japkowicz N. Learning from imbalanced data sets: a comparison of various strategies[C]//Proceedings of Menlo Park, CA: AAAI Press. Technical Report WS-00-05,2000. 2000:10-15.
[17] Chan P, Stolfo S. Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection[C]//Proceedings of 4th Intl. Conf. on Knowledge Discovery and Data Mining, 1998. 1998: 164-168.

基金

国家863高科技项目(2011AA01A205),自然科学基金(60903107, 61073071)
PDF(1466 KB)

Accesses

Citation

Detail

段落导航
相关文章

/