该文描述了基于特定事件的新闻报道和微博在话题层面的对比研究。首先利用LDA话题模型抽取两种媒体上关于特定事件的话题,然后提出了话题关注度、差异度、演化度的定义和计算公式,改进了不同媒体话题差异度的计算方法,最后,选取四个不同种类的事件,进行实验对比与分析,结果显示,关于同一事件,1)微博上评论性话题较多,话题关注度值比较接近;新闻报道上事实性话题较多,话题关注度值差异较大;2)微博与新闻报道对评论性话题词汇差异度大,事实性话题词汇差异度小;3)微博上评论性话题持续时间较长,内容变化较少;新闻报道上事实性话题持续时间较长,内容变化较少。
Abstract
This work conducts a contrastive study on the topics of specific events from microblog and news media. Firstly, we use LDA to extract topics from the two media, and then define three indexes: attention factor, diversity factor and evolution factor for an improved topic discrepancy calculation. Finally, we chose four events of different types for experiments and analysis. The results show: 1) There are more comment topics appearing on microblog with close attention factors in contrast to a high proportion of factual topics with varied attention factors in the news media. 2) In both microblog and news media, diversity factor of words used in the comment topics is bigger than that in factual topics. 3) In microblog, comment topics last longer with consistent contents, while the factual topics does so in the news media.
关键词
话题模型 /
微博 /
新闻报道 /
对比
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.
[2] Blei D M, Lafferty J D. A Correlated Topic Model of Science[J]. The Annals of Applied Statistics 2007,1(1):17-35.
[3] Blei D M, Lafferty J D. Dynamic Topic Model[C]//Proceedings of International conference on Machine Learning, 2006: 113-120.
[4] Liangjie Hong, Davison B D. Empirical study of topic modeling in Twitter[C]//Proceedings of the SIGKDD Workshop on SMA,2008.
[5] Xin Zhao, Jing Jiang, JianshuWeng, et al. Comparing Twitter and traditional media using topic models[C]//Proceedings of the European Conference on Information Retrieval, 2011.
[6] Ramage D, Dumais S, Liebling D. Characterizing Microblogs with Topic Models[C]//Proceedings of AAAI on Weblogs and Social Media, 2010.
[7] Ramage D, Hall D, Nallapati R, et al. Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2009.
[8] Yan Qu, Chen Huang, Pengyi Zhang, et al. Microblogging after a Major Disaster in China: A Case Study of the 2010 Yushu Earthquake[C]//Proceedings of the ACM 2011 conference on Computer supported cooperative work, 2011: 25-34.
[9] Vieweg S, Hughes A L, Starbird K, et al. MicrobloggingDuring Two Natural Hazards Events: What Twitter May Contribute to Situational Awareness[C]//Proceedings of the 28th International Conference on Human factors in computing systems, 2010: 1079-1088.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(60873134)
{{custom_fund}}