基于多策略优化的分治多层聚类算法的话题发现研究

骆卫华,于满泉,许洪波,王斌,程学旗

PDF(253 KB)
PDF(253 KB)
中文信息学报 ›› 2006, Vol. 20 ›› Issue (1) : 31-38.

基于多策略优化的分治多层聚类算法的话题发现研究

  • 骆卫华1,2,于满泉1,2,许洪波1,王斌1,程学旗1
作者信息 +

The Study of Topic Detection Based on Algorithm of Division and Multi-level Clustering with Multi-strategy Optimization

  • LUO Wei-hua1,2,YU Man-quan1,2,XU Hong-bo1,WANG Bin1,CHENG Xue-qi1
Author information +
History +

摘要

话题发现与跟踪是一项评测驱动的研究,旨在依据事件对语言文本信息流进行组织利用。自1996年提出以来,该研究得到了越来越广泛的关注。本文在研究已有成熟算法的基础上,提出了基于分治多层聚类的话题发现算法,其核心思想是把全部数据分割成具有一定相关性的分组,对各个分组分别进行聚类,得到各个分组内部的话题(微类) ,然后对所有的微类再进行聚类,得到最终的话题,在聚类的过程中采用多种策略进行优化,以保证聚类的效果。基于该算法的系统在TDT4中文语料上进行了测试,结果表明该算法属于目前结果最好的算法之一。

Abstract

Topic Detection and Tracking is a research driven by evaluation , which intends to organize and utilize information stream of texts according to event. Since being brought forward in 1996 ,it comes under more and more attention. This paper proposes an algorithm of division and multi-level clustering with multi-strategy optimization , which bases on study of today’s mature algorithms. The core thought of the algorithm is to divide all data into groups (each group has intrinsic relevance) ,and cluster in each group to produce micro-clusters ,and then cluster on all micro-clusters to result in final topics. During the process , various strategies are employed to improve the effect of clustering. The system implemented with the algorithm has been tested on TDT4 corpus. The test indicates the algorithm is one present best algorithm.

关键词

计算机应用 / 中文信息处理 / 话题发现与跟踪 / 分治多层聚类 / 系统聚类

Key words

computer application / Chinese information processing / topic detection and tracking / division and multi-level clustering / hierarchical clustering

引用本文

导出引用
骆卫华,于满泉,许洪波,王斌,程学旗. 基于多策略优化的分治多层聚类算法的话题发现研究. 中文信息学报. 2006, 20(1): 31-38
LUO Wei-hua,YU Man-quan,XU Hong-bo,WANG Bin,CHENG Xue-qi. The Study of Topic Detection Based on Algorithm of Division and Multi-level Clustering with Multi-strategy Optimization. Journal of Chinese Information Processing. 2006, 20(1): 31-38

参考文献

[1] 骆卫华,刘群,程学旗. 话题检测与跟踪技术的发展与研究[A] . 孙茂松,陈群秀. 全国计算语言学联合学术会议(JSCL - 2003) 论文集[C] . 北京:清华大学出版社,2003 ,560 - 566.
[2] Jonathan G. Fiscus ,George R.Doddington. Topic Detection and Tracking Evaluation Overview[A] . In : James Allan. Topic Detection and Tracking ,Event-based Information Organization[C] .Norwell :Kluwer Academic Publishers ,2002 , 17 - 31.
[3] Y. Yang , T. Pierce ,J. Carbonell. A Study on Retrospective and Online Event Detection [A] . In : W. Bruce Croft , Alistair Moffat ,C. J. van Rijsbergen , et al. Proceedings of the 21th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98) [C] . New York : ACM Press , 1998 , 28 - 36.
[4] Brants , T. , Chen , F. R. , Farahat , A. O. A system for new event detection[A] . In : Charles Clarke , et al. Proceedings of SIGIR 2003 , the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [C] . New York : ACM Press ,2003 ,330 - 337.
[5] Juha Makkonen , Helena Ahonen-Myka , and Marko Salmenkivi. Simple Semantics in Topic Detection and Tracking [J] . Information Retrieval , 2004 , 7 (3 - 4) : 347 - 368.
[6] Y. Yang , J. Carbonell , C. Jin. Topic-conditioned novelty detection[A] . In : Hand D , et al. Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C] . New York : ACM Press , 2002 ,688 - 693.

基金

国家973资助项目(2004CB318109)
PDF(253 KB)

Accesses

Citation

Detail

段落导航
相关文章

/