基于阈值的快速启动Top-k查询处理算法

PDF(2204 KB)

中文信息学报 ›› 2017, Vol. 31 ›› Issue (5) : 163-170.

信息检索与问答系统

基于阈值的快速启动Top-k查询处理算法

江宇^1,2,宋省身²,杨岳湘³,姜琨⁴

作者信息 +

Rapid Start Top-k Query Based on Threshold

JIANG Yu^{1, 2}, SONG Xingshen², YANG Yuexiang³, JIANG Kun⁴

Author information +

History +

摘要

Top-k查询是搜索引擎领域广泛应用的技术之一,该算法从海量数据中返回最符合用户需求的前k 个结果,在执行时能避免对大部分无关文档的打分处理。Top-k 查询虽然极大提升了查询性能,但其存在的慢启动问题并未得到有效解决。为此,该文首先提取倒排索引的静态Top-k信息,再动态计算针对具体查询词项的初始阈值,在此基础上,结合MaxScore和WAND算法,提出了快速启动的Top-k查询处理算法。实验结果表明,该方法能够有效解决上述问题,具有良好的性能。

Abstract

Top-k query is a popular technique of search engines, which returns the most relative results for user from massive data. Although Top-k query significantly improves the performance of the system, its slow-start issue has not been effectively resolved. This paper extracts static Top-k information of inverted index and then calculats initial threshold in real time for specific query. On this basis, this paper presents a rapid start algorithm of Top-k query for the current state-of-art methods MaxScore and WAND. Experimental results show that the proposed approach achieves better performance.

导出引用

江宇,宋省身,杨岳湘,姜琨. 基于阈值的快速启动Top-k查询处理算法. 中文信息学报. 2017, 31(5): 163-170

JIANG Yu, SONG Xingshen, YANG Yuexiang, JIANG Kun. Rapid Start Top-k Query Based on Threshold. Journal of Chinese Information Processing. 2017, 31(5): 163-170

参考文献

[1] Turtle H, Flood J. Query evaluation:strategies andoptimizations[J]. Information Processing & Management, 1995, 31(6):831-850.
[2] Broder A Z, Carmel D, Herscovici M, et al. Efficient query evaluation using a two-level retrieval process[C]// Proceedings of the 12th International Conference on Information and Knowledge Management. ACM, 2003:426-434.
[3] Moffat A, Zobel J. Self-indexing inverted files for fast text retrieval[J]. ACM Transactions on Information Systems (TOIS), 1996, 14(4):349-379.
[4] Jonassen S, Bratsberg S E. Efficient compressed inverted index skipping for disjunctive text-queries[M]. Advances in Information Retrieval. Springer Berlin Heidelberg, 2011:530-542.
[5] Yan H, Ding S, Suel T. Inverted index compression and query processing with optimized document ordering[C]//Proceedings of the 18th International Conference on World Wide Web. ACM, 2009:401-410.
[6] Ricardo B Y, Berthier R N. Modern information retrieval:the concepts and technology behind search (2nd Edition)[M]. Addision Wesley, 2011, 84:2.
[7] Strohman T, Turtle H, Croft W B. Optimization strategies for complex queries[C]//Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2005:219-225.
[8] Rossi C, de Moura E S, Carvalho A L, et al. Fast document-at-a-time query processing using two-tier indexes[C]//Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2013:183-192.
[9] Ding S, Suel T. Faster Top-k document retrieval using block-max indexes[C]//Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2011:993-1002.
[10] de Carvalho L L S, de Moura E S, Daoud C M, et al. Heuristics to improve the BMW method and its variants[J]. Journal of Information and Data Management, 2016, 6(3):178.
[11] Jonassen S, Bratsberg S E. Efficient compressed inverted index skipping for disjunctive text-queries[M]. Advances in Information Retrieval. Springer Berlin Heidelberg, 2011:530-542.

基金

湖南省自然科学基金(2016JJ2007)

PDF(2204 KB)

764

Accesses

Citation

Detail

段落导航

摘要
Abstract
关键词
Key words
引用本文
参考文献
基金

Received	Published
2016-08-16	2017-10-16
Issue Date
2017-10-16

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注

基金