面向微博搜索的时间敏感的排序学习方法

王书鑫,卫冰洁,鲁 骁,王 斌

PDF(2883 KB)
PDF(2883 KB)
中文信息学报 ›› 2015, Vol. 29 ›› Issue (4) : 175-182.
信息检索与问答系统

面向微博搜索的时间敏感的排序学习方法

  • 王书鑫1,卫冰洁2,鲁 骁2,王 斌3
作者信息 +

Temporal Sensitive Learning to Rank Method for Microblog Search

  • WANG Shuxin1, WEI Bingjie2, LU Xiao2, WANG Bin3
Author information +
History +

摘要

近年来微博检索已经成为信息检索领域的研究热点。相关的研究表明,微博检索具有时间敏感性。已有工作根据不同的时间敏感性假设,例如,时间越新文档越相关,或者时间越接近热点时刻文档越相关,得到多种不同的检索模型,都在一定程度上提高了检索效果。但是这些假设主要来自于观察,是一种直观简化的假设,仅能从某个方面反映时间因素影响微博排序的规律。该文验证了微博检索具有复杂的时间敏感特性,直观的简化假设并不能准确地描述这种特性。在此基础上提出了一个利用微博的时间特征和文本特征,通过机器学习的方式来构建一个针对时间敏感的微博检索的排序学习模型(TLTR)。在时间特征上,考察了查询相关的全局时间特征以及查询-文档对的局部时间特征。在TREC Microblog Track 20112012数据集上的实验结果表明,TLTR模型优于现有的其他时间敏感的微博排序方法。

Abstract

Microblog search has become a hot research problem in information retrieval area in recent years. Related work shows that most queries in microblog search are time-sensitive. To address this problem, many existing methods were proposed based on different time-sensitive assumptions, such as, “the newer of a document, the more important it is” or “the closer to the peak point a document is, the more important it is”. All these methods have improved retrieval effectiveness somehow. However, it is hard to summarize the temporal role in ranking of microblog search to one straight forward assumption as above. In this paper, our study on temporal distributions of relevant documents of different queries shows the complexity of temporal role in ranking; therefore, simple straight forward assumptions are not accurate. We proposed to use the temporal and entity evidences of query-document pairs to train a time-sensitive learning to rank model to tackle this problem. As for temporal features, both global features of query and local features of query-documents pair are extracted. Experimental results show that TLTR significantly improves the retrieval effectiveness over existing time aware ranking models on TREC Microblog Track 2011—2012 data set.

关键词

时间敏感 / 排序学习 / 微博搜索

Key words

time-sensitive / learning to rank / microblog search

引用本文

导出引用
王书鑫,卫冰洁,鲁 骁,王 斌. 面向微博搜索的时间敏感的排序学习方法. 中文信息学报. 2015, 29(4): 175-182
WANG Shuxin, WEI Bingjie, LU Xiao, WANG Bin. Temporal Sensitive Learning to Rank Method for Microblog Search. Journal of Chinese Information Processing. 2015, 29(4): 175-182

参考文献

[1]Teevan J, Ramage D, Morris M R. TwitterSearch: a comparison of microblog search and web search[C]//Proceedings of the 4th ACM international conference on Web search and data mining. ACM, 2011: 35-44.
[2] Kanhabua N, Nrvg K. Learning to rank search results for time-sensitive queries[C]//Proceedings of the 21st ACM international conference on information and knowledge management. ACM, 2012: 2463-2466.
[3] Li X, Croft W B. Time-based language models[C]//Proceedings of the 12th international conference on Information and knowledge management. ACM, 2003: 469-475.
[4] Efron M, Golovchinsky G. Estimation methods for ranking recent information[C]//Proceedings of the 34th international ACM SIGIR conference on research and development in Information Retrieval. ACM, 2011: 495-504.
[5] Wei B, Zhang S, Li R, et al. A time-aware language model for microblog retrieval[R]//Report of TREC Microblog Adhoc Track, 2012.
[6] 卫冰洁, 王斌. 面向微博搜索的时间感知的混合语言模型[J]. 计算机学报, 2014, 37(1):229-237.
[7] Miyanishi T, Seki K, Uehara K. Combining recency and topic-dependent temporal variation for microblog search[M]//Advances in Information Retrieval. Springer Berlin Heidelberg, 2013: 331-343.
[8] Efron M, Lin J, He J, et al. Temporal feedback for tweet search with non-parametric density estimation[C]//Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval. ACM, 2014: 33-42.
[9] Miyanishi T, Seki K, Uehara K. Time-aware latent concept expansion for microblog search[C]//Proceedings of the 8th International AAAI Conference on Weblogs and Social Media. 2014.
[10] Herbrich R, Graepel T, Obermayer K. Large margin rank boundaries for ordinal regression[J]. Advances in neural information processing systems, 1999: 115-132.
[11] Cao Z, Qin T, Liu T Y, et al. Learning to rank: from pairwise approach to listwise approach[C]//Proceedings of the 24th international conference on machine learning. ACM, 2007: 129-136.
[12] Ounis I, Macdonald C, Lin J, et al. Overview of the trec-2011 microblog track[C]//Proceedings of the 20th Text REtrieval Conference. 2011.
[13] Metzler D, Cai C. USC/ISI at TREC 2011: Microblog Track[C]//Proceedings of the TREC. 2011.
[14] Miyanishi T, Okamura N, Liu X, et al. TREC 2011 Microblog Track Experiments at Kobe University[R].
[15] Zhang X, He B, Luo T, et al. Query-biased learning to rank for real-time twitter search[C]//Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 2012: 1915-1919.
[16] Damak F, Pinel-Sauvagnat K, Boughanem M, et al. Effectiveness of State-of-the-art Features for Microblog Search[C]//Proceedings of the 28th Annual ACM Symposium on Applied Computing. ACM, 2013: 914-919.

基金

中国科学院先导专项课题(XDA06030200)
PDF(2883 KB)

631

Accesses

0

Citation

Detail

段落导航
相关文章

/