陈笑蓉,刘作国. 文本聚类的重构策略研究[J]. 中文信息学报, 2016, 30(2): 189-195.
CHEN Xiaorong, LIU Zuoguo. Research on Reorganization of Text Clustering Results. , 2016, 30(2): 189-195.
文本聚类的重构策略研究
陈笑蓉,刘作国
贵州大学 计算机科学与技术学院,贵州 贵阳 550025
Research on Reorganization of Text Clustering Results
CHEN Xiaorong, LIU Zuoguo
(College of Computer Science & Technology,Guizhou University, Guiyang,Guizhou 550025, China)
Abstract:This paper illustrates a distance oriented reorganization strategy in which clusters could be reorganized in independence from clustering process. The concept of Nearest Domain is proposed and Nearest Domain rules are elaborated. Then Gauss Weighing Algorithm is designed to re-wieght a text by the distance from cluster kernel. At last, Nearest Domain Weights will separates sparse clusters and adjusts abnormal texts while combines similar ones. Clustering experiment shows that reorganization process effectively improves the accuracy and recall rate and makes result more reasonable by increasing the inner density of clusters.
[1] MShahriar Hossain, Praveen Kumar Reddy Ojili, Cindy Grimm, et al. Scatter/Gather Clustering: Flexibly Incorporating User Feedback to Steer Clustering Results[J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2012, 18(12): 2829-2838. [2] 王灿田,孙玉宝,刘青山.基于稀疏重构的超图谱聚类方法[J].计算机科学,2014,41(2): 145-148,156. [3] Jinjiang Li, Hui Fan, Da Yuan, et al. Kernel Function Clustering Based on Ant Colony Algorithm[C]//Guo Maozu. ICNC 2008. Jinan, China. 2008: 645-649. [4] 季铎,王智超,蔡东风,等.基于全局性确定聚类中心的文本聚类[J].中文信息学报,2008,22(3): 50-55. [5] 曾依灵,许洪波,吴高巍,等.一种基于空间映射及尺度变换的聚类框架[J].中文信息学报,2010,24(3): 81-88. [6] Nisha M N, Mohanavalli S, Swathika R. Improving the quality of Clustering using Cluster Ensembles[C]//Proceedings of 2013 IEEE Conference on Information and Communication Technologies. 2013: 88-92. [7] 刘金岭,冯万利,张亚红.初始化簇类中心和重构标度函数的文本聚类[J].计算机应用研究,2011,28(11): 4115-4117. [8] 陈建超,胡桂武,杨志华,等.基于全局性确定聚类中心的文本聚类[J].计算机工程与应用,2011,47(10): 147-150. [9] Amineh Amini, Teh Ying Wah, Mahmoud Reza Saybani, et al. A Study of Density-Grid based Clustering Algorithms on Data Streams[C] //Ding Yongsheng. FSKD 2011. Shanghai, China. 2011: 1652-1656. [10] 曾依灵,许洪波,吴高巍,等.一种基于语料库特征的聚类算法[J].软件学报,2010,21(11): 2802-2813.