基于日志挖掘的搜索引擎用户行为分析

岑荣伟,刘奕群,张 敏,茹立云,马少平

PDF(1408 KB)
PDF(1408 KB)
中文信息学报 ›› 2010, Vol. 24 ›› Issue (3) : 49-55.
综述

基于日志挖掘的搜索引擎用户行为分析

  • 岑荣伟,刘奕群,张 敏,茹立云,马少平
作者信息 +

Search Engine User Behavior Analysis Based on Log Mining

  • CEN Rongwei, LIU Yiqun, ZHANG Min, RU Liyun, MA Shaoping
Author information +
History +

摘要

随着网络搜索用户的大规模增加,网络用户行为分析已成为网络信息检索系统进行架构分析、性能优化和系统维护的重要基石,是网络信息检索和知识挖掘的重要研究领域之一。为更好理解网络用户的搜索行为,该文基于7.56亿条真实网络用户行为日志,对用户行为进行分析和研究。我们主要考察了用户搜索行为中的查询长度、查询修改率、相关搜索点击率、首次/最后一次点击位置分布以及查询内点击数分布等信息。该文还基于不同类型的查询集合,考察用户在不同查询需求下的行为差异性。相关分析结果对搜索引擎算法优化和系统改进等都具有一定的参考意义。

Abstract

With the growth in amount of search users, the behavior analysis has become one of the most important research issues for search engines in terms of architecture analysis, performance optimization and system maintenance. It is also a major area in both information retrieval and knowledge management. In order to better understand search behavior of web users, we analyzed web user behaviors based on 756 million entries of click-through logs. Several important aspects of user behaviors are studied, such as query length, ratio of query refining, query recommendation access, first/last click distribution, click number in query, et al. We also analyzed the differences in user behavior for different information needs based on separate query sets. These analyses may help improve both effectiveness and efficiency of search engines.
Key wordscomputer application; Chinese information processing;user behavior analysis; search engine; web information retrieval

关键词

计算机应用 / 中文信息处理 / 用户行为分析 / 搜索引擎 / 网络信息检索

Key words

computer application / Chinese information processing / user behavior analysis / search engine / web information retrieval

引用本文

导出引用
岑荣伟,刘奕群,张 敏,茹立云,马少平. 基于日志挖掘的搜索引擎用户行为分析. 中文信息学报. 2010, 24(3): 49-55
CEN Rongwei, LIU Yiqun, ZHANG Min, RU Liyun, MA Shaoping. Search Engine User Behavior Analysis Based on Log Mining. Journal of Chinese Information Processing. 2010, 24(3): 49-55

参考文献

[1] CNNIC (China Internet Network Information Center). The 25st report in development of Internet in China[R]. http://www.cnnic.net.cn/uploadfiles/pdf/2010/1/15/101600.pdf. 2010.
[2] Cockburn, A. and Jones, S. Which way now? Analysing and easing inadequacies in WWW navigation[J]. International Journal of Human-Computer Studies, 1996, 45, 105-129.
[3] Tauscher, L., & Greenberg, S. How people revisit web pages: Empirical findings and implications for the design of history systems[J]. International Journal of Human-Computer Studies, 1997, 47, 97-137.
[4] Craig Silverstein, Monika Henzinger, Hannes Marais, et al. Analysis of a very large Web search engine query log[C]//SIGIR Forum, 1998, 33 (1):6-12.
[5] 余慧佳,刘奕群,张敏,茹立云,马少平. 基于大规模日志分析的网络搜索引擎用户行为研究[J]. 中文信息学报,2007, 21(1): 109-114.
[6] Agichtein E, Brill E, Dumais S. Improving web search ranking by incorporating user behavior information[C]//SIGIR06, New York, NY, USA, 2006:19-26.
[7] Dou Z, Song R, Yuan X, Wen J. Are click-through data adequate for learning web search rankings?[C]//Proceeding of the CIKM ’08. ACM, New York, NY, 2008:73-8.
[8] Liu Y, Cen R, Zhang M, Ru L, Ma S. Automatic Search Engine Evaluation Based On User Behavior Analysis[J]. Journal of Software, 2008,19(11):3023-3032.
[9] Danny Sullivan, Search Engine Sizes[R]. In search engine watch website,http://searchenginewatch.com/reports/article.php/2156481.
[10] Joachims T, Granka L, Pan B, Hembrooke H, Gay G. Accurately interpreting clickthrough data as implicit feedback[C]//Proceedings of the SIGIR’05. ACM, New York, NY, 2005:154-161.
[11] Downey, D., Dumais, S., Liebling, D., and Horvitz, E. 2008. Understanding the relationship between searchers’ queries and information goals[C]//Proceeding of the CIKM ’08. ACM, New York, NY, 2008:449-458.

基金

国家自然科学基金项目资助(60736044,60903107);高等学校博士学科点专项科研基金资助(20090002120005)
PDF(1408 KB)

854

Accesses

0

Citation

Detail

段落导航
相关文章

/