Abstract:Topic detection and tracking, as one of natural language processing technologies, is to detect unknown topic and track known topic from the information of news medium. Since its pilot research in 1996, several large-scale evaluation conferences have provided a good environment for evaluating technologies of recognition, collection and organization. As topic detection and tracking shares similar challenges with information retrieval, data mining and information extraction in abrupt and successive data, it has become a hot research issue in the field of nature language processing. This paper introduced the background, definition, evaluation and methods in topic detection and tracking, and explored its future development trend through analyzing current research.
[1] J Allan, J Carbonell, G Doddington, J Yamron and Y Yang. Topic detection and tracking pilot study: Final report [A]. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop [C], Virginia: Lansdowne, February 1998, 194-218. [2] James Allan, Ron Papka, Victor Lavrenko. On-line New Event Detection and Tracking [A]. In: the proceedings of SIGIR'98 [C]. University of Massachusetts: Amherst, 1998, 37-45. [3] J Allan, V Lavrenko, and R Swan. Explorations within topic tracking and detection [A]. In: Topic Detection and Tracking: Event-based Information Organization [C].Kluwer Academic: Massachusetts, 2002, 197-224. [4] J M Schultz and M Y Liberman. Towards an universal dictionary for multi-language IR applications [A].In: Topic Detection and Tracking: Event-based Information Organization [C].Kluwer Academic: Massachusetts, 2002, 225-241 [5] J Yamron, L Gillick, P van Mulbregt, and S Knecht. Statistical models of topical content [A].In: Topic Detection and Tracking: Event-based Information Organization [C].Kluwer Academic: Massachusetts, 2002, 115-134. [6] Leek T, Schwartz R M., and Sista S. Probabilistic approaches to topic detection and tracking [A]. In: Topic Detection and Tracking: Event-based Information Organization [C].Kluwer Academic: Massachusetts, 2002, 67-83. [7] Franck Thollard. Probabilistic DFA Inference Using Kullback-Leibler Divergence and Minimality [A].In: Proc of the 17th Int'l Conf on Machine Learning [C]. San Francisco: Morgan Kaufmann, 2000, 975-982. [8] J Ponte and W B Croft. Text segmentation by topic [A]. In: Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries [C]. Europe: ECDL, 1997, pages 113-125. [9] J Xu and W B Croft. Improving the effectiveness of information retrieval with local context analysis [J]. ACM Transactions on Information Systems (TOIS), 2000, 18(1):79-112. [10] Y Watanabe, Y Okaxta, K Kaneji, and Y Sakamoto. Multiple Media Database System for TV Newscasts and Newspapers [A]. In: Technical Report of IEIGE[C]. Japan, 1998, 47-54.. [11] C Buckley and G Salton. Optimization of relevance feedback weights [A]. In: Proceedings of SIGIR'95 [C]. Washington, United States: Seattle, 1995, 351-357. [12] B Masland, G Linoff, and D Waltz. Classifying news stories using memory based reasoning [A]. In: Proceedings of SIGIR '92 [C]. Denmark: Copenhagen,1992, 59-65. [13] Y.Zhang, J. G. Carbonell, J. Allan. Topic Detection and Tracking: Detection-Task [A]. In: Proceedings of the Workshop of Topic Detection and Tracking [C], 1997. [14] J Carbonell, Y Yang, J Lafferty, R D. Brown, T. Pierce, and X. Liu. CMU Report on TDT-2: Segmentation, Detection and Tracking [A]. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop [C]. San Francisco: Morgan Kauffman, 1999, 117-120. [15] J Kupiec and J Pedersen. A trainable document summarizer [A]. In: Proceedings of the 18th Annual Int'l ACM SIGIR Conf on Research and Development in Information Retrieval (SIGIR'95) [C]. Seattle, Washington, USA: ACM Press, 1995, 68-73. [16] D D Lewis, R E Schapire, J P Callan, and R Papka. Training Algorithms for Linear Text Classifiers[A]. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C]. Konstanz: Hartung-Gorre Verlag, 1996, 298-306. [17] R E Schapire. BoosTexter: A Boosting-based System for Text Categorization [J]. Machine Learning, 1999, 39(2-3):135-168. [18] J M Schultz and Mark Liberman. Topic detection and tracking using idf-weighted cosine coefficient [A]. In: Proceedings of the DARPA Broadcast News Workshop [C]. San Francisco: Morgan Kaufmann, 1999, 189-192. [19] J P Yamron, I Carp, L Gillick, S Lowe and P V Mulbregt. Topic Tracking in a News Stream [A]. In: Proceedmgs of the DARPA Broadcast News Tracnscription and Understanding Workshop [C], San Francisco: Morgan Kaufmann, 1999. [20] S A Lowe. The Beta~Binomial Mixture Model and its Application to TDT Tracking and Detection [A]. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop [C], San Francisco: Morgan Kaufmann, 1999. [21] M Franz, JS Mc Carley. Unsupervised and supervised clustering for topic tracking[A].In: Proceedings of the 24th annual international ACM SIGIR[C]. New Orleans, Louisiana, USA: ACM, 2001, 310 317. [22] Nianli Ma, Yiming Yang, Monica Rogati. Applying CLIR Techniques to Event Tracking [A].In: AIRS 2004[C]. Berlin Heidelberg: Springer-Verlag, 2005,24 35. [23] L S Larkey, F F Feng, M Connell, V Lavrenko. Language-specific Models in Multilingual Topic Tracking [A].In: Proceedings of the 27th annual international conference on research and development in information retrieval [C]. Sheffield, UK, 2004, 402-409. [24] T Strzalkowski, G C Stein and G B Wise. GE.Tracker: A Robust, Lightweight Topic Tracking System [A]. In: Proceedings of the DARPA Broadcast News Workshop [C]. San Francisco: Morgan Kaufmann, 1999, [25] J P Yamron, S Knecht, and P V Mulbregt. Dragon's Tracking and Detection Systems for the TDT2000 Evaluation [A]. In: Topic Detection and Tracking Workshop [C]. USA : National Institute of Standard and Technology, 2000, 75 79. [26] J Allan, V Lavrenko, D Frey, V Khandelwal. UMass at TDT 2000 [A]. In: Proceedings of Topic Detection and Tracking Workshop [C]. USA: National Institute of Standar and Technology, 2000, 109-115. [27] N Lester, HE Williams. TDT2001 Topic Tracking at RMIT University[A].In: The Topic Detection and Tracking (TDT) Workshop [C], 2001. [28] W Lam, S Mukhopadhyay, J Mostafa, and M Palakal. Detection of Shifts in User Interests for Personalized Information Filtering [A]. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. Konstanz: Hartung-Gorre Verlag, 1996, 317-325. [29] Y Lo, J L Gauvain. The LIMSI Topic Tracking System For TDT 2002 [A]. In: Topic Detection and Tracking Workshop [C]. Gaithersburg, USA, 2002. [30] Y Yang, T Pierce, J Carbonell. A study on Retrospective and On-Line Event detection [A].In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval [C]. 1998, CMU, USA: ACM, 28-36. [31] Ron Papka. On-line New Event Detection, Clustering and Tracking [D]. Amherst: Department of Computer Science, UMASS,1999. [32] Allan J, Papka R, Lavrenko V. On-Line New Event Detection and Tracking [A].In: Proceedings of SIGIR '98:21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. New York: ACM Press, 1998, 37-45. [33] Y Yang, T Pierce, J Carbonell. A study on Retrospective and On-Line Event detection [A]. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval [C]. 1998, CMU, USA: ACM, 28-36. [34] T Brants, F Chen, and A Farahat. A system for new event detection [A]. In: Proceedings of the 26th SIGIR conference on Research and development in information retrieval [C], 2003. [35] G Kumaran and J Allan. Text classification and named entities for new event detection [A]. In: Proceedings of the SIGIR Conference on Research and Development in Information Retrieval [C]. Sheffield, South Yorkshire: ACM, 2004, 297-304. [36] J. Allan, H Jin, M Rajman, C Wayne, G D, L V, R Hoberman, and D Caputo. Topic-based novelty detection [A].In: Proceedings of the Johns Hopkins Summer Workshop [C]. CLSP, Baltimore, 1999. [37] Y Yang, J Carbonell, C Jin. Topic-conditioned novelty detection[A]. In:Hand D,etal.Proceedings ofthe 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C]. New York: ACM Press, 2002, 688-693. [38] W Lam, H Meng, K Wong, and J Yen. Using contextual analysis for news event detection [J]. International Journal on Intelligent Systems, 2001, 16(4):525-546. [39] Z Li, B Wang, M J Li, W Y Ma. A Probabilistic Model for Retrospective News Event Detection [A].In: Proceedings of the 28th annual international ACM SIGIR[C]. Salvador, Brazil: ACM, 2005, 106-113. [40] The 2004 Topic Detection and Tracking (TDT2004) Task Definition and Evaluation Plan [H]. version 1.2, http://www.nist.gov. [41] D R Cutting, D R Karger, J O Pedersen, and J W Tukey. Scatter/gather: a cluster-based approach to browsing large document collections [A]. In: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval [C]. NY:ACM, 1992, 318 329. [42] D Trieschnigg and W Kraaij. TNO hierarchical topic detection report at TDT 2004[A]. In: The 7th Topic Detection and Tracking Conf[C]. 2004. [43] Allan J, Bolivar A, Connell M, Cronen-Townsend S, Feng A, Feng F, Kumaran G, Larkey L, Lavrenko V, Raghavan H. UMass TDT 2003 Research Summary[A]. In: Proceedings of TDT 2003 evaluation, unpublished[C], 2003. [44] Levow G A and Oard D W. Signal boosting for translingual topic tracking: Document expansion and n-best translation [A]. In: Topic detection and tracking: Event-based information organization [C]. MA: Kluwer, 2002, 175-195. [45] Jin H, Schwartz R, Sista S and Walls F. Topic Tracking for Radio, TV Broadcast and Newswire[A].In: Proceedings of the DARPA Broadcast News Workshop[C]. San Francisco: Morgan Kaufmann, 1999, 199-204. [46] Tim Leek, Hubert Jin, Sreenivasa Sista, Richard Schwartz. The BBN Crosslingual Topic Detection and Tracking System[A].In: Working Notes of the Third Topic Detection and Tracking Workshop[C]. 2000. [47] 骆卫华,刘群,程学旗.话题检测与跟踪技术的发展与研究[A].全国计算语言学联合学术会议(JSCL-2003)论文集[C].北京:清华大学出版社,2003,560-566. [48] 李保利,俞士汶.话题识别与跟踪研究[J].计算机工程与应用,2003,39(17):6-10. [49] 贾自艳,何清,张俊海等. 一种基于动态进化模型的事件探测和追踪算法[J]. 计算机研究与发展,2004,41(7):1273 -1280. [50] 赵华,赵铁军,张姝,王浩畅. 基于内容分析的话题检测研究[J]. 哈尔滨工业大学学报,2006,10(38):1740-1743. [51] Zhang Kuo, Li Juan Zi, Wu Gang. New Event Detection Based on Indexing-tree and Named Entity[A].In: Sigir2007[C]. ACM: Amsterdam, 2007. [52] 宋丹,卫东,陈英. 基于改进向量空间模型的话题识别跟踪[J]. 计算机技术与发展,2006, 9(16):62-67. [53] 于满泉,骆卫华,许洪波,白硕. 话题识别与跟踪中的层次化话题识别技术研究[J]. 计算机技术与发展,2006,43(3): 489-495. [54] 骆卫华,于满泉,许洪波,王斌,程学旗. 基于多策略优化的分治多层聚类算法的话题发现研究[J].中文信息学报,2006,20(1):29-36. [55] 赵华,赵铁军,于浩,张姝. 面向动态演化的话题检测研究[J]. 高技术通讯,2006,12(16):1230-1235. [56] 金珠,林鸿飞,赵晶. 基于HowNet的话题跟踪及倾向性分类研究[J].情报学报,2005,5(24):555-561. [57] Ponte, J M and Croft, W B. A Language Modeling Approach to Information Retrieval [A].In: ACM SIGIR [C]. NY: ACM, 1998, 275-281. [58] V Lavrenko, J Allan, E DeGuzman, D LaFlamme. Models for Topic Detection and Tracking [A].In: Proceedings of HLT-2002 [C], 2002, 104-110. [59] R Nallapati. Semantic Language Models for Topic Detection and Tracking [A].In: Proceedings of HLT-NAACL2003 Student Research Workshop [C]. 2003, 1-6. [60] V Lavrenko and W B Croft. Relevance-based language models[A].In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval[C]. New Orleans, Louisiana , USA: ACM, 2001, 267-275. [61] W B Croft, S Cronen-Townsend, and V Lavrenko. Relevance feedback and personalization: A language modeling perspective [A]. In: Proceedings of the DELOS-NSF Workshop on Personalization and Recommender Systems in Digital Libraries [C]. 2001, 49-54. [62] Jane Morris, Graeme Hirst. Lexical Cohesion by Thesaural Relations as an Indicator of the Structure of Text [J], Computational Linguistics, 1991, 17(1): 21-48. [63] HASAN R. Coherence and cohesive harmony [A].In: Flood L, eds. Understanding Reading Comprehension [C]. Newark, Delaware: International Reading Association, 1984, 181-219. [64] Nicola Stokes, Paula Hatch, Joe Carthy. Lexical Semantic Relatedness and Online New Event Detection [A].In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval [C]. Greece: ACM, 2000, 324-325. [65] Hatch P, Stokes N, Carthy J. Topic detection, a new application for lexical chaining? [A]. In: British Computer Society IRSG 2000 [C]. Cambridge: British Computer Society , 2000, 94-103.