话题检测与跟踪的评测及研究综述

洪宇,张宇,刘挺,李生

PDF(761 KB)
PDF(761 KB)
中文信息学报 ›› 2007, Vol. 21 ›› Issue (6) : 71-87.
综述

话题检测与跟踪的评测及研究综述

  • 洪宇,张宇,刘挺,李生
作者信息 +

Topic Detection and Tracking Review

  • HONG Yu,ZHANG Yu,LIU Ting,LI Sheng
Author information +
History +

摘要

话题检测与跟踪是一项面向新闻媒体信息流进行未知话题识别和已知话题跟踪的信息处理技术。自从1996年前瞻性的探索以来,该领域进行的多次大规模评测为信息识别、采集和组织等相关技术提供了新的测试平台。由于话题检测与跟踪相对于信息检索、信息挖掘和信息抽取等自然语言处理技术具备很多共性,并面向具备突发性和延续性规律的新闻语料,因此逐渐成为当前信息处理领域的研究热点。本文简要介绍了话题检测与跟踪的研究背景、任务定义、评测方法以及相关技术,并通过分析目前TDT领域的研究现状展望未来的发展趋势。

Abstract

Topic detection and tracking, as one of natural language processing technologies, is to detect unknown topic and track known topic from the information of news medium. Since its pilot research in 1996, several large-scale evaluation conferences have provided a good environment for evaluating technologies of recognition, collection and organization. As topic detection and tracking shares similar challenges with information retrieval, data mining and information extraction in abrupt and successive data, it has become a hot research issue in the field of nature language processing. This paper introduced the background, definition, evaluation and methods in topic detection and tracking, and explored its future development trend through analyzing current research.

关键词

计算机应用 / 中文信息处理 / 综述 / 话题检测与跟踪 / 自然语言处理 / 事件 / 新闻报道

Key words

: computer application / Chinese information processing / overview / topic detection and tracking / natural language processing / event / news story
 
/   /   /
 
/   /  

引用本文

导出引用
洪宇,张宇,刘挺,李生. 话题检测与跟踪的评测及研究综述. 中文信息学报. 2007, 21(6): 71-87
HONG Yu,ZHANG Yu,LIU Ting,LI Sheng. Topic Detection and Tracking Review. Journal of Chinese Information Processing. 2007, 21(6): 71-87

参考文献

[1] J Allan, J Carbonell, G Doddington, J Yamron and Y Yang. Topic detection and tracking pilot study: Final report [A]. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop [C], Virginia: Lansdowne, February 1998, 194-218.
[2] James Allan, Ron Papka, Victor Lavrenko. On-line New Event Detection and Tracking [A]. In: the proceedings of SIGIR'98 [C]. University of Massachusetts: Amherst, 1998, 37-45.
[3] J Allan, V Lavrenko, and R Swan. Explorations within topic tracking and detection [A]. In: Topic Detection and Tracking: Event-based Information Organization [C].Kluwer Academic: Massachusetts, 2002, 197-224.
[4] J M Schultz and M Y Liberman. Towards an universal dictionary for multi-language IR applications [A].In: Topic Detection and Tracking: Event-based Information Organization [C].Kluwer Academic: Massachusetts, 2002, 225-241
[5] J Yamron, L Gillick, P van Mulbregt, and S Knecht. Statistical models of topical content [A].In: Topic Detection and Tracking: Event-based Information Organization [C].Kluwer Academic: Massachusetts, 2002, 115-134.
[6] Leek T, Schwartz R M., and Sista S. Probabilistic approaches to topic detection and tracking [A]. In: Topic Detection and Tracking: Event-based Information Organization [C].Kluwer Academic: Massachusetts, 2002, 67-83.
[7] Franck Thollard. Probabilistic DFA Inference Using Kullback-Leibler Divergence and Minimality [A].In: Proc of the 17th Int'l Conf on Machine Learning [C]. San Francisco: Morgan Kaufmann, 2000, 975-982.
[8] J Ponte and W B Croft. Text segmentation by topic [A]. In: Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries [C]. Europe: ECDL, 1997, pages 113-125.
[9] J Xu and W B Croft. Improving the effectiveness of information retrieval with local context analysis [J]. ACM Transactions on Information Systems (TOIS), 2000, 18(1):79-112.
[10] Y Watanabe, Y Okaxta, K Kaneji, and Y Sakamoto. Multiple Media Database System for TV Newscasts and Newspapers [A]. In: Technical Report of IEIGE[C]. Japan, 1998, 47-54..
[11] C Buckley and G Salton. Optimization of relevance feedback weights [A]. In: Proceedings of SIGIR'95 [C]. Washington, United States: Seattle, 1995, 351-357.
[12] B Masland, G Linoff, and D Waltz. Classifying news stories using memory based reasoning [A]. In: Proceedings of SIGIR '92 [C]. Denmark: Copenhagen,1992, 59-65.
[13] Y.Zhang, J. G. Carbonell, J. Allan. Topic Detection and Tracking: Detection-Task [A]. In: Proceedings of the Workshop of Topic Detection and Tracking [C], 1997.
[14] J Carbonell, Y Yang, J Lafferty, R D. Brown, T. Pierce, and X. Liu. CMU Report on TDT-2: Segmentation, Detection and Tracking [A]. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop [C]. San Francisco: Morgan Kauffman, 1999, 117-120.
[15] J Kupiec and J Pedersen. A trainable document summarizer [A]. In: Proceedings of the 18th Annual Int'l ACM SIGIR Conf on Research and Development in Information Retrieval (SIGIR'95) [C]. Seattle, Washington, USA: ACM Press, 1995, 68-73.
[16] D D Lewis, R E Schapire, J P Callan, and R Papka. Training Algorithms for Linear Text Classifiers[A]. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C]. Konstanz: Hartung-Gorre Verlag, 1996, 298-306.
[17] R E Schapire. BoosTexter: A Boosting-based System for Text Categorization [J]. Machine Learning, 1999, 39(2-3):135-168.
[18] J M Schultz and Mark Liberman. Topic detection and tracking using idf-weighted cosine coefficient [A]. In: Proceedings of the DARPA Broadcast News Workshop [C]. San Francisco: Morgan Kaufmann, 1999, 189-192.
[19] J P Yamron, I Carp, L Gillick, S Lowe and P V Mulbregt. Topic Tracking in a News Stream [A]. In: Proceedmgs of the DARPA Broadcast News Tracnscription and Understanding Workshop [C], San Francisco: Morgan Kaufmann, 1999.
[20] S A Lowe. The Beta~Binomial Mixture Model and its Application to TDT Tracking and Detection [A]. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop [C], San Francisco: Morgan Kaufmann, 1999.
[21] M Franz, JS Mc Carley. Unsupervised and supervised clustering for topic tracking[A].In: Proceedings of the 24th annual international ACM SIGIR[C]. New Orleans, Louisiana, USA: ACM, 2001, 310 317.
[22] Nianli Ma, Yiming Yang, Monica Rogati. Applying CLIR Techniques to Event Tracking [A].In: AIRS 2004[C]. Berlin Heidelberg: Springer-Verlag, 2005,24 35.
[23] L S Larkey, F F Feng, M Connell, V Lavrenko. Language-specific Models in Multilingual Topic Tracking [A].In: Proceedings of the 27th annual international conference on research and development in information retrieval [C]. Sheffield, UK, 2004, 402-409.
[24] T Strzalkowski, G C Stein and G B Wise. GE.Tracker: A Robust, Lightweight Topic Tracking System [A]. In: Proceedings of the DARPA Broadcast News Workshop [C]. San Francisco: Morgan Kaufmann, 1999,
[25] J P Yamron, S Knecht, and P V Mulbregt. Dragon's Tracking and Detection Systems for the TDT2000 Evaluation [A]. In: Topic Detection and Tracking Workshop [C]. USA : National Institute of Standard and Technology, 2000, 75 79.
[26] J Allan, V Lavrenko, D Frey, V Khandelwal. UMass at TDT 2000 [A]. In: Proceedings of Topic Detection and Tracking Workshop [C]. USA: National Institute of Standar and Technology, 2000, 109-115.
[27] N Lester, HE Williams. TDT2001 Topic Tracking at RMIT University[A].In: The Topic Detection and Tracking (TDT) Workshop [C], 2001.
[28] W Lam, S Mukhopadhyay, J Mostafa, and M Palakal. Detection of Shifts in User Interests for Personalized Information Filtering [A]. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. Konstanz: Hartung-Gorre Verlag, 1996, 317-325.
[29] Y Lo, J L Gauvain. The LIMSI Topic Tracking System For TDT 2002 [A]. In: Topic Detection and Tracking Workshop [C]. Gaithersburg, USA, 2002.
[30] Y Yang, T Pierce, J Carbonell. A study on Retrospective and On-Line Event detection [A].In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval [C]. 1998, CMU, USA: ACM, 28-36.
[31] Ron Papka. On-line New Event Detection, Clustering and Tracking [D]. Amherst: Department of Computer Science, UMASS,1999.
[32] Allan J, Papka R, Lavrenko V. On-Line New Event Detection and Tracking [A].In: Proceedings of SIGIR '98:21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. New York: ACM Press, 1998, 37-45.
[33] Y Yang, T Pierce, J Carbonell. A study on Retrospective and On-Line Event detection [A]. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval [C]. 1998, CMU, USA: ACM, 28-36.
[34] T Brants, F Chen, and A Farahat. A system for new event detection [A]. In: Proceedings of the 26th SIGIR conference on Research and development in information retrieval [C], 2003.
[35] G Kumaran and J Allan. Text classification and named entities for new event detection [A]. In: Proceedings of the SIGIR Conference on Research and Development in Information Retrieval [C]. Sheffield, South Yorkshire: ACM, 2004, 297-304.
[36] J. Allan, H Jin, M Rajman, C Wayne, G D, L V, R Hoberman, and D Caputo. Topic-based novelty detection [A].In: Proceedings of the Johns Hopkins Summer Workshop [C]. CLSP, Baltimore, 1999.
[37] Y Yang, J Carbonell, C Jin. Topic-conditioned novelty detection[A]. In:Hand D,etal.Proceedings ofthe 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C]. New York: ACM Press, 2002, 688-693.
[38] W Lam, H Meng, K Wong, and J Yen. Using contextual analysis for news event detection [J]. International Journal on Intelligent Systems, 2001, 16(4):525-546.
[39] Z Li, B Wang, M J Li, W Y Ma. A Probabilistic Model for Retrospective News Event Detection [A].In: Proceedings of the 28th annual international ACM SIGIR[C]. Salvador, Brazil: ACM, 2005, 106-113.
[40] The 2004 Topic Detection and Tracking (TDT2004) Task Definition and Evaluation Plan [H]. version 1.2, http://www.nist.gov.
[41] D R Cutting, D R Karger, J O Pedersen, and J W Tukey. Scatter/gather: a cluster-based approach to browsing large document collections [A]. In: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval [C]. NY:ACM, 1992, 318 329.
[42] D Trieschnigg and W Kraaij. TNO hierarchical topic detection report at TDT 2004[A]. In: The 7th Topic Detection and Tracking Conf[C]. 2004.
[43] Allan J, Bolivar A, Connell M, Cronen-Townsend S, Feng A, Feng F, Kumaran G, Larkey L, Lavrenko V, Raghavan H. UMass TDT 2003 Research Summary[A]. In: Proceedings of TDT 2003 evaluation, unpublished[C], 2003.
[44] Levow G A and Oard D W. Signal boosting for translingual topic tracking: Document expansion and n-best translation [A]. In: Topic detection and tracking: Event-based information organization [C]. MA: Kluwer, 2002, 175-195.
[45] Jin H, Schwartz R, Sista S and Walls F. Topic Tracking for Radio, TV Broadcast and Newswire[A].In: Proceedings of the DARPA Broadcast News Workshop[C]. San Francisco: Morgan Kaufmann, 1999, 199-204.
[46] Tim Leek, Hubert Jin, Sreenivasa Sista, Richard Schwartz. The BBN Crosslingual Topic Detection and Tracking System[A].In: Working Notes of the Third Topic Detection and Tracking Workshop[C]. 2000.
[47] 骆卫华,刘群,程学旗.话题检测与跟踪技术的发展与研究[A].全国计算语言学联合学术会议(JSCL-2003)论文集[C].北京:清华大学出版社,2003,560-566.
[48] 李保利,俞士汶.话题识别与跟踪研究[J].计算机工程与应用,2003,39(17):6-10.
[49] 贾自艳,何清,张俊海等. 一种基于动态进化模型的事件探测和追踪算法[J]. 计算机研究与发展,2004,41(7):1273 -1280.
[50] 赵华,赵铁军,张姝,王浩畅. 基于内容分析的话题检测研究[J]. 哈尔滨工业大学学报,2006,10(38):1740-1743.
[51] Zhang Kuo, Li Juan Zi, Wu Gang. New Event Detection Based on Indexing-tree and Named Entity[A].In: Sigir2007[C]. ACM: Amsterdam, 2007.
[52] 宋丹,卫东,陈英. 基于改进向量空间模型的话题识别跟踪[J]. 计算机技术与发展,2006, 9(16):62-67.
[53] 于满泉,骆卫华,许洪波,白硕. 话题识别与跟踪中的层次化话题识别技术研究[J]. 计算机技术与发展,2006,43(3): 489-495.
[54] 骆卫华,于满泉,许洪波,王斌,程学旗. 基于多策略优化的分治多层聚类算法的话题发现研究[J].中文信息学报,2006,20(1):29-36.
[55] 赵华,赵铁军,于浩,张姝. 面向动态演化的话题检测研究[J]. 高技术通讯,2006,12(16):1230-1235.
[56] 金珠,林鸿飞,赵晶. 基于HowNet的话题跟踪及倾向性分类研究[J].情报学报,2005,5(24):555-561.
[57] Ponte, J M and Croft, W B. A Language Modeling Approach to Information Retrieval [A].In: ACM SIGIR [C]. NY: ACM, 1998, 275-281.
[58] V Lavrenko, J Allan, E DeGuzman, D LaFlamme. Models for Topic Detection and Tracking [A].In: Proceedings of HLT-2002 [C], 2002, 104-110.
[59] R Nallapati. Semantic Language Models for Topic Detection and Tracking [A].In: Proceedings of HLT-NAACL2003 Student Research Workshop [C]. 2003, 1-6.
[60] V Lavrenko and W B Croft. Relevance-based language models[A].In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval[C]. New Orleans, Louisiana , USA: ACM, 2001, 267-275.
[61] W B Croft, S Cronen-Townsend, and V Lavrenko. Relevance feedback and personalization: A language modeling perspective [A]. In: Proceedings of the DELOS-NSF Workshop on Personalization and Recommender Systems in Digital Libraries [C]. 2001, 49-54.
[62] Jane Morris, Graeme Hirst. Lexical Cohesion by Thesaural Relations as an Indicator of the Structure of Text [J], Computational Linguistics, 1991, 17(1): 21-48.
[63] HASAN R. Coherence and cohesive harmony [A].In: Flood L, eds. Understanding Reading Comprehension [C]. Newark, Delaware: International Reading Association, 1984, 181-219.
[64] Nicola Stokes, Paula Hatch, Joe Carthy. Lexical Semantic Relatedness and Online New Event Detection [A].In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval [C]. Greece: ACM, 2000, 324-325.
[65] Hatch P, Stokes N, Carthy J. Topic detection, a new application for lexical chaining? [A]. In: British Computer Society IRSG 2000 [C]. Cambridge: British Computer Society , 2000, 94-103.




中科院软件所筹建国内首家软件博物馆
   近日,中国科学院软件研究所发起建设我国首家以计算机软件为主题的软件博物馆。
   软件博物馆旨在记录软件发展历程,展示软件发展成就,传播软件科技知识,宣传软件科学文化。届时软件博物馆将以丰富翔实的史料和珍贵的实物,将计算机软件从起步到现在的发展状况以及未来发展趋势生动、直观地展示给大众。通过各种展示手段,追溯软件的发展历程,发掘软件文化内涵,弘扬科学精神,普及科技知识。
   软件博物馆计划于2008年中期向公众开放。目前,正面向社会各界广泛征集能反映国内外软件发展历程和软件发展成就的实物、照片、回忆文章、模型、成果展示材料等。有捐赠意向的单位及个人请与软件博物馆建设办公室联系。
   地址: 北京中关村南四街4号中科院软件园区5号楼202室
   邮编: 100080
   电话: 86-10-62661035
   传真: 86-10-62661035
   Email: rjbwg@iscas.ac.cn
   联系人: 李洁
   

基金

国家自然科学基金资助项目(60435020,60575042,60503072)
PDF(761 KB)

Accesses

Citation

Detail

段落导航
相关文章

/