周振宇,李芳. 特定事件微博与新闻报道话题对比研究[J]. 中文信息学报, 2014, 28(1): 47-55.
ZHOU Zhenyu, LI Fang. Comparing Topics from Microblog and News Media about Specific Events. , 2014, 28(1): 47-55.
特定事件微博与新闻报道话题对比研究
周振宇,李芳
上海交通大学 计算机科学与工程系 中德语言技术联合实验室,上海 200240
Comparing Topics from Microblog and News Media about Specific Events
ZHOU Zhenyu, LI Fang
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Abstract:This work conducts a contrastive study on the topics of specific events from microblog and news media. Firstly, we use LDA to extract topics from the two media, and then define three indexes: attention factor, diversity factor and evolution factor for an improved topic discrepancy calculation. Finally, we chose four events of different types for experiments and analysis. The results show: 1) There are more comment topics appearing on microblog with close attention factors in contrast to a high proportion of factual topics with varied attention factors in the news media. 2) In both microblog and news media, diversity factor of words used in the comment topics is bigger than that in factual topics. 3) In microblog, comment topics last longer with consistent contents, while the factual topics does so in the news media.
[1] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003, 3: 993-1022. [2] Blei D M, Lafferty J D. A Correlated Topic Model of Science[J]. The Annals of Applied Statistics 2007,1(1):17-35. [3] Blei D M, Lafferty J D. Dynamic Topic Model[C]//Proceedings of International conference on Machine Learning, 2006: 113-120. [4] Liangjie Hong, Davison B D. Empirical study of topic modeling in Twitter[C]//Proceedings of the SIGKDD Workshop on SMA,2008. [5] Xin Zhao, Jing Jiang, JianshuWeng, et al. Comparing Twitter and traditional media using topic models[C]//Proceedings of the European Conference on Information Retrieval, 2011. [6] Ramage D, Dumais S, Liebling D. Characterizing Microblogs with Topic Models[C]//Proceedings of AAAI on Weblogs and Social Media, 2010. [7] Ramage D, Hall D, Nallapati R, et al. Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2009. [8] Yan Qu, Chen Huang, Pengyi Zhang, et al. Microblogging after a Major Disaster in China: A Case Study of the 2010 Yushu Earthquake[C]//Proceedings of the ACM 2011 conference on Computer supported cooperative work, 2011: 25-34. [9] Vieweg S, Hughes A L, Starbird K, et al. MicrobloggingDuring Two Natural Hazards Events: What Twitter May Contribute to Situational Awareness[C]//Proceedings of the 28th International Conference on Human factors in computing systems, 2010: 1079-1088.