1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; 2. Graduate University of Chinese Academy of Sciences, Beijing 100190, China; 3. National Computer network Emergency Response Technical Team/Coordination Center, Beijing 100190, China
Abstract:Due to such natures as content diversity, dynamic evolution ,and so on, its difficult to get high quality subtopics for web texts and topics by traditional topic detection and tracking models. An algorithm of subtopic partition based on absorbing Markov chain is proposed to address this issue. The algorithm firstly gathers the topic keywords clustered by the web pages to generate subtopics, then derived subtopics based on the absorbing Markov chain. The experimental results show that the algorithm performs well in terms of both significance and diversity.
[1] Makkonen J,Ahonen-MykaHand SalmenkiviM Applying semantic classes in event detection and tracking[C]//Proceedings of International Conference on Natural Language Processing(ICON) Mumbai, India,2002: 175-183. [2] Hua-Jun Zeng,Qi-Cai He, Zheng chen, et al. Learning to Cluster Web Search Results[C]//Proceedings of SIGIR04, July, Sheffield, South Yorkshire, UK,2004:25-29. [3] 王巍. 基于关键词和时间点的网络话题演化分析.[D]. 复旦大学中国优秀硕士学位论文. 2009. [4] 袁继鹏. 网络舆情话题演化及话题重要度分析[D],中国科学院计算技术研究所硕士学位论文, 2012. [5] 李军,李娟子. 新闻专题内子话题划分. 清华大学计算机科学与技术系[C]//Proceedings of the Fourth National Conference of Information Retrieval and Content Security,2008,Vol.1. [6] 张瑾. 面向Web话题的多文档文摘关键技术研究[D]. 中国科学院计算技术研究所博士学位论文,2009. [7] Zhu Xiaojin,Goldberg A B,Van Gael J,et al. Improving diversity in ranking using absorbing random walks[C]//Proceedings of Human Language Technologies:the Annual Conference of the North American Chapter of the Association for Computational Linguistics.Rochester:NAACL,2007:97-104. [8] 洪宇,张宇,刘挺,等.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007,21(6):72-57. [9] 骆卫华,于满泉,许洪波,等.基于多策略优化的分治多层聚类算法的话题发现研究[J]. 中文信息学报,2006,20(1):29-36. [10] 张瑾,许洪波. 基于动态内容的文摘方法研究[C]. 第三届全国信息检索与内容安全学术会议论文集(NCIRCS 2007),苏州, 2007. [11] 张瑾,王小磊,许洪波. 自动文摘评价方法综述[J]. 中文信息学报, 2008,22(3):81-88. [12] 王灿辉.Web环境下的新闻专题构建和话题挖掘研究[D],清华大学博士学位论文, 2008. [13] 文利娟.Web社区中话题的发现与排序[D],武汉理工大学硕士学位论文, 2009. [14] 贾自艳,何清,张海俊,等. 一种基于动态进化模型的时间检测和追踪算法[J],计算机研究与发展, 2004,41(7):1273-1280.