Adaptive Incremental K-means Algorithm for Topic Detection
LI Shengdong1, LV Xueqiang2, SHI Shuicai2, SUN Jun3
1. Department of Computer Engineering, Langfang Yanjing Polytechnic College, Langfang, Hebei 065200, China; 2. Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China; 3. North China Institute of Aerospace Engineering, Langfang, Hebei 065000, China
Abstract:According to the definition and characteristics of topic detection, the paper analyzes the advantages and disadvantages of the traditional incremental clustering algorithm and K-means algorithm, and proposes an adaptive incremental K-means algorithm for topic detection. Experimental results prove that the new algorithm improves the performance of topic detection.
[1] 郑斐然,苗夺谦,张志飞,等.一种中文微博新闻话题检测的方法[J].计算机科学,2012,39(1): 138-141 [2] 张阔,李涓子,吴刚,等. 基于关键词元的话题内事件检测 [J]. 计算机研究与发展,2009,46(02): 245-251. [3] 李忠俊.基于话题检测与聚类的内部舆情监测系统[J].计算机科学,2012,39(12): 241-244. [4] Nist. The 2004 Topic Detection and Tracking (TDT2004) Task Definition and Evaluation Plan. http://www.itl.nist.gov/iad/mig/tests/tdt/2004/TDT04.Eval.Plan.v1.2.pdf. [5] 马慧芳,王博. 基于增量主题模型的微博在线事件分析[J]. 计算机工程, 2013, 39(3): 191-196. [6] 骆卫华,于满泉,许洪波,等. 基于多策略优化的分治多层聚类算法的话题发现研究[J]. 中文信息学报,2006,20(1): 29-36. [7] 洪宇,张宇,刘挺,等. 话题检测与跟踪的评测及研究综述[J]. 中文信息学报,2007,21(6): 71-87. [8] 吕明磊,刘冬梅,曾智勇.一种改进的K-means聚类算法的图像检索方法[J]. 计算机科学,2013,40(8): 285-288. [9] 毛嘉莉. 基于K-means的文本聚类算法[J]. 计算机系统应用,2009,(10): 85-87. [10] 李胜东,吕学强,魏震等.基于两层阈值的话题报道表示模型[J]. 华中科技大学学报(自然科学版),2013,41(S2): 117-120. [11] Li Xinwu. Research on Text Clustering Algorithm Based on K_means and SOM[C]//Proceedings of ShangHai: International Symposium on Intelligent Information Technology Application Workshops, 2008: 341-344. [12] 中科院计算所. 基于多层隐马模型的汉语词法分析系统ICTCLAS. http://www.nlp.org.cn/project/project.php?proj_id=6. [13] 谭松波,王月粉. 中文文本分类语料库-TanCorpV1.0. http://www.searchforum.org.cn/tansongbo/corpus.htm. [14] Tan S B, et al. A Novel Refinement Approach for Text Categorization[C]//Proceedings of ACM CIKM, 2005. [15] Tim Leek, Richard Schwartz, Srinivasa Sista. Probabilistic Approaches to Topic Detection and Tracking [J]. Data Mining and Knowledge Discovery. 2003, 7(3): 67-83. [16] Yiming Yang, Jaime Carbonell, Ralf Brown, et al. Multi-Strategy Learning for Topic Detection and Tracking: a joint report of CMU approaches to multilingual TDT[C]//Proceedings of TDT 2002 Workshop. 2002: 85-114.