用户行为分析是网络信息检索技术得以前进的重要基石,也是能够在商用搜索引擎中发挥重要作用的各种算法的基本出发点之一。为了更好的理解中文搜索用户的检索行为,本文对搜狗搜索引擎在一个月内的近5 000万条查询日志进行了分析。我们从独立查询词分布、同一session内的用户查询习惯及用户是否使用高级检索功能等方面对用户行为进行了分析。分析结论对于改进中文搜索引擎的检索算法和更准确的评测检索效果都有较好的指导意义。
Abstract
User log analysis is important for both Web information retrieval technologies and commercial search engine algorithms. In order to better understand search behavior of Chinese Web search users, we presents an analysis of Sogou Search Engine query log consisting of approximately 50 million entries for search requests over a period of one month. The analysis includes search retrieval behavior in individual queries distribution, user request customs in the same session and whether using advanced search functions. Conclusions may help improve Web information retrieval algorithms and search performance evaluation methods.
关键词
计算机应用 /
中文信息处理 /
网络信息检索 /
搜索引擎 /
用户行为分析 /
点击信息分析
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
web information retrieval /
search engine /
user behavior analysis /
click through data analysis
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Cockburn, A., & Jones, S. Which way now? Analyzing and easing inadequacies in WWW navigation [J]. International Journal of Human-Computer Studies, 1996, 45, 105-129.
[2] Catledge, L. D., & Pitkow, J. E. Characterizing Browsing Strategies in the World-Wide Web [J]. Computer Networks and ISDN Systems, 1995, 27, 1065-1073.
[3] Tauscher, L., & Greenberg, S. How people revisit web pages: Empirical findings and implications for the design of history systems [J]. International Journal of Human-Computer Studies, 1997, 47, 97-137.
[4] Craig Silverstein, Monika Henzinger, Hannes Marais, et al. Analysis of a very large Web search engine query log [J]. In SIGIR Forum, fall 1998, Volume 33: Number 1, 6-12.
[5] Jansen, B. J., Spink, A., Bateman, J., & Saracevic, T. Real life information retrieval: A study of user queries on the Web [J]. SIGIR Forum, 1998, 32(1): 5-17.
[6] 第14次中国互联网络发展状况统计报告[R].中国互联网络信息中心(CNNIC),2004年7月.
[7] 第15次中国互联网络发展状况统计报告[R].中国互联网络信息中心(CNNIC),2005年1月.
[8] 第17次中国互联网络发展状况统计报告[R].中国互联网络中心(CNNIC),2006年1月.
[9] Danny Sullivan, Search Engine Sizes. In search engine watch website [J],http://searchenginewatch.com/reports/article.php/2156481.
[10] Andrei Broder, A taxonomy of web search [J]. In SIGIR Forum, fall 2002, Volume 36 Number2.
[11] Ellen M. Voorhees, Donna Harman. Overview of TREC 2001 [A]. E. M. Voorhees and D. K. Harman, eds. In: Proceedings of the tenth Text Retrieval Conference [C]. Gaithersburg: National Institute of Standards and Technology, NIST, 2002, volume 10.
[12] Ellen M. Voorhees. Overview of TREC 2002. E. M [A]. Voorhees and Lori P. Buckland, eds. In: Proceedings of the eleventh Text Retrieval Conference [C]. Gaithersburg: National Institute of Standards and Technology, NIST, 2003, volume 11.
[13] D. Hawking and N. Craswell. Overview of the TREC-2002 web track [A]. E. M. Voorhees and Lori P. Buckland, eds. In: Proceedings of the eleventh Text Retrieval Conference [C]. Gaithersburg: National Institute of Standards and Technology, NIST, 2003.
[14] D. Hawking and N. Craswell. Overview of the TREC-2003 web track [A]. E. M. Voorhees, eds. In: Proceedings of the twelfth Text Retrieval Conference [C]. Gaithersburg: National Institute of Standards and Technology, NIST, 2004.
[15] 国家863计划基础资源与评测,2003年度信息检索评测大纲,http://www.863data.org.cn/src/863history/2003/2003fulltextretrieval_s.zip.
[16] 国家863计划基础资源与评测,2004年度信息检索评测大纲,http://www.863data.org.cn/src/2004eval/.
[17] Open Directory Project, http://www.dmoz.org.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家重点基础研究(973)资助项目(2004CB318108);国家自然科学基金资助项目(60223004, 60321002, 60303005, 60503064);教育部科学技术研究重点资助项目(104236)
{{custom_fund}}