在线事件检索是针对事件查询,按时间序迭代返回小批量数据集中事件相关文档的检索任务。其目标是在时间轴上不断收集新鲜的事件文档,是进行一系列事件相关工作的重要基础。面对此任务,传统方法采用先进的检索模型来提升检索精度,然而却没有考虑事件本身的特性。针对这一问题,该文尝试使用两类图(事件关键词共现图、融合事件类型的二部图)对事件建模,提出了一种基于事件图的在线检索框架。案例分析与在两个公开的TREC数据集上的实验结果表明,该文方法显著提升了事件检索精度(P@10最高增幅达30%,平均增幅5.85%),且能自适应在线检索环境,支持事件的演变分析。
Abstract
Online Event Retrieval is a retrieval task for event queries, which returns important event-related documents from mini-batch data sets iteratively in chronological order. This paper propose san online event retrieval framework based on two kinds of graphs: event key-words co-occurrence graph and bipartite graph incorporated with event type. Case study and experiments on two pubic TREC corpus indicate that our approach improves the event retrieval precision significantly (maximum increase reaches 30%, average reaches 5.85% in metric P@10).
关键词
事件图 /
在线事件检索 /
事件查询模型 /
事件演变
{{custom_keyword}} /
Key words
event graph /
online event-based retrieval /
event query model /
event development
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Allan J, Papka R, Lavrenko V. On-line new event detection and tracking[C] //Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1998: 37-45.
[2] Lin C H, Yen C W, Hong J S, et al. Event-based textual document retrieval by using semantic role labeling and coreference resolution[C] //Proceedings of IADIS International Conference WWW/Internet 2007, 2007.
[3] Glava G, Nnajder J. Event-centered information retrieval using kernels on event graphs[J] . Graph-Based Methods for Natural Language Processing, 2013: 1.
[4] Lafferty J, Zhai C. Document language models, query models, and risk minimization for information retrieval[C] //Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2001: 111-119.
[5] Lavrenko V, Croft W B. Relevance based language models[C] //Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2001: 120-127.
[6] Lv Y, Zhai C X. Positional relevance model for pseudo-relevance feedback[C] //Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2010: 579-586.
[7] Zhai C, Lafferty J. Model-based feedback in the language modeling approach to information retrieval[C] //Proceedings of the 10th International Conference on Information and Knowledge Management. ACM, 2001: 403-410.
[8] Sayyadi H, Hurst M, Maykov A. Event detection and tracking in social streams[C] //Proceedings of ICWSM, 2009.
[9] Weng J, Lee B S. Event detection in Twitter[J] . ICWSM, 2011(11): 401-408.
[10] Fukumoto F, Suzuki Y. Using graph-based indexing to identify subject-shift in topic tracking[M] //Human Language Technology. Challenges of the Information Society. Springer Berlin Heidelberg, 2007: 392-404.
[11] van Rijsbergen C J. A theoretical basis for the use of co-occurrence data in information retrieval[J] . Journal of Documentation, 1977, 33(2): 106-119.
[12] Zhou T, Ren J, Medo M, et al. Bipartite network projection and personal recommendation[J] . Physical Review E, 2007, 76(4): 046115.
[13] Aslam J, Diaz F, Ekstrand-Abueg M, et al. TREC 2014 temporal summarization track overview[R] . National Inst of Standards and Technology Gaithersburg MD, 2015.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(6157050517);科技部重点专项子课题(2016YFB0801003)
{{custom_fund}}