基于用户特性的搜索引擎查询结果缓存与预取

马宏远1,2,王 斌1

PDF(3459 KB)
PDF(3459 KB)
中文信息学报 ›› 2012, Vol. 26 ›› Issue (6) : 19-27.
综述

基于用户特性的搜索引擎查询结果缓存与预取

  • 马宏远1,2,王 斌1
作者信息 +

Query Results Caching and Prefetching in Web Search Engines Based on User Characteristics

  • MA Hongyuan1,2, WANG Bing1
Author information +
History +

摘要

针对搜索引擎查询结果缓存与预取问题,与传统的基于查询特性相关的方法不同,提出了一种基于用户特性的缓存与预取方法,用于提高搜索引擎系统性能,尤其针对部分用户效果更显著。通过对国内某著名商业搜索引擎用户的查询贡献分析得出,用户对搜索引擎的贡献具有长尾分布特性,结合该特性设计查询结果预测模型来进行预取和分区缓存。在该搜索引擎两个月的大规模真实用户查询日志上的实验结果表明,与传统的基于查询特性的典型方法相比,该方法可以获得3.03%~4.17%的命中率提升,对于查询贡献最大的0.25%的用户群体,可以获得20.52%~28.2%的命中率提升。

Abstract

Query results caching and prefetching are crucial to the efficiency of Web search engines. This paper pre-sents a novel approach tailored for query results caching and prefetching based on the user characteristics. We describe an analysis of query logs originated from a famous Web search engine, and design a query results prediction model for prefetching and to partition the cache exploiting the characteristics of the users. We then use a real large scale query logs of 2-months to evaluate the approach, in contrast to the traditional methods and theoretical upper bounds. Experimental results show that this approach can achieve 3.03% to 4.17% increase for all requests as compared with state-of-the-art methods, and 20.52% to 28.2% increase for requests from the special users group who contributes most to Web search engines.
Key wordsquery results cache; user characteristics; performance optimization

关键词

查询结果缓存 / 用户特性 / 性能优化

Key words

query results cache / user characteristics / performance optimization

引用本文

导出引用
马宏远1,2,王 斌1. 基于用户特性的搜索引擎查询结果缓存与预取. 中文信息学报. 2012, 26(6): 19-27
MA Hongyuan1,2, WANG Bing1. Query Results Caching and Prefetching in Web Search Engines Based on User Characteristics. Journal of Chinese Information Processing. 2012, 26(6): 19-27

参考文献

[1] CNNIC(China Internet Network Information Center). The 25th report in development of Internet in China[DB/OL]. http://www.cnnic.net.cn/uploadfiles/pdf/2010/1/15/101600.pdf. 2010.
[2] E.P. Markatos. On caching search engine results[J]. Computer Communications. Elsevier Science B.V.. 2001, 24(2):137-143.
[3] Y. Xie, D.R. OHallaron. Locality in search engine queries and its implications for caching[C]//Proceedings of INFOCOM 02, 2002:1238-1247.
[4] T. Fagni, R. Perego, F. Sivestri, et al. Boosting the performance of web search engines: caching and prefetching query results by exploiting historical usage data[J]. ACM Trans. Information Systems, 2006, 24(1):51-78.
[5] Q. Gan, T. Suel. Improved techniques for result caching in web search engines[C]//Proceedings of WWW09, ACM, 2009:431-440.
[6] R. Ozcan, I.S. Altingovde, et al. Static query result caching revisited[C]//Proceedings of WWW08, ACM, 2008:1169-1170.
[7] G. Skobeltsyn, F. Junqueira, et al. ResIn: A combination of results caching and index pruning for high performance web search engines[C]//Proceedings of SIGIR08, ACM, 2008:131-138.
[8] R. Lempel, S. Moran. Predictive caching and prefet-ching of query results in search engines[C]//Proceedings of WWW03, ACM, 2003:19-28.
[9] 班志杰,古志民,金瑜. Web预取技术综述[J]. 计算机研究与发展, 2009, 46(2):202-210.
[10] Y. Li, S. Zhang, B. Wang, et al. Characteristics of Chinese Web Searching: A Large-Scale Analysis of Chinese Query Logs[J]. Journal of Computational Information Systems. Bethel: Binary Information Press, 2008, 4(3):1127-1136.
[11] 岑荣伟,刘奕群,张敏,等. 基于日志挖掘的搜索引擎用户行文分析[J]. 中文信息学报, 2010, 24(3):49-54.
[12] 余慧佳,刘奕群,张敏,等. 基于大规模日志分析的网络搜索引擎用户行为研究[J]. 中文信息学报, 2007, 21(1):109-114.
[13] D.R. Slutz, I.L. Tratger. A note on the calculation of average working set size[J]. ACM Commun., 1974, 17(10):563-565.
[14] R.L. Mattson, J. Gecsie, et al. Evaluation Techniques for Storage Hierarchies[J]. IBM Systems Journal, 1970, 9(2):78-117.
[15] B.G. Prieve, R.S. Fabry. VMIN-An Optimal Variable Space Page Replacement Algorithm[J]. ACM Comm., 1976, 19(5):295-297.
[16] B.B. Cambazoglu, F.P. Junqueira, V. Plachouras, et al. A refreshing perspective of search engine caching[C]//Proceedings of WWW10, ACM, 2010:181-190.
[17] R. Blanco, E. Bortnikov, F. Junqueira, et al. Ca-ching search engine results over incremental indices[C]//Proceedings of SIGIR10, ACM, 2010:82-89.
[18] S. Alici, I.S. Altingovde, R. Ozcan, et al. Timestamp-based Result Cache Invalidation for Web Search Engines[C]//Proceedings of SIGIR11, ACM, 2011:973-982.

基金

国家自然科学基金资助项目(60873166);国家973资助项目(2007CB311103);国家863计划资助项目(2006AA010105);教育部科学技术研究重点资助项目(109028)
PDF(3459 KB)

Accesses

Citation

Detail

段落导航
相关文章

/