Emerging Topic Detection in Online Social Networks: A Survey
GOU Chengcheng1,2, DU Pan1, LIU Yue1, CHENG Xueqi1
1. CAS Key Lab of Network Data Science and Technology,Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China2. University of Chinese Academy of Sciences, Beijing 100190, China
Abstract:Emerging topic detection is one of the major research focus in Social Network Analysis. The openness of social networks, microblog in particular, provides unprecedented favorable conditions on which the topics might rage and outbreak. The emerging topics are often accompanied by big news or events, which are about to outbreak and have a significant social impact. How to identify these topics in the early stages is the major research content of the emerging topic detection. The main developments in the field of the emerging topic detection in the recent years are reviewed and the relevant concepts, methods and theory are elaborated. The methods of the emerging topic detection are analyzed and discussed form the perspective of the content bursty feature and information diffusion models. Finally we conclude the paper with an exploration of future research directions.
[1] Victor Lavrenko, James Allan, Edward DeGuzman, et al. Relevance models for topic detection and tracking[C]// Proceedings of the 2nd International Conference on Human Language Technology Research. San Francisco, USA, 2002: 115-121. [2] 洪宇, 张宇, 刘挺等. 话题检测与跟踪的评测及研究综述[J]. 中文信息学报, 2007, 21(6): 71-87. [3] James Allan, Victor Lavrenko, Daniella Malin, et al. Detections, bounds, and timelines: Umass and tdt-3[C]// Proceedings of Topic Detection and Tracking Workshop. Vienna, VA, 2000: 167-174. [4] James Allan, Jaime G Carbonell, George Doddington, et al. Topic detection and tracking pilot study final report[C]// Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. 1998: 194-218. [5] Yiming Yang, Thomas Pierce, Brian T Archibald, et al. Learning approaches for detecting and tracking news events[J]. IEEE Intelligent Systems, 1999, 14(4): 32-43. [6] Douglass R Cutting, David R Karger, Jan O Pedersen, et al. Scatter/gather: A cluster-based approach to browsing large document collections[C]//Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval. 1992: 318-329. [7] 于满泉, 骆卫华, 许洪波等. 话题识别与跟踪中的层次化话题识别技术研究[J]. 计算机研究与发展, 2006, 43(3): 489-495. [8] David Arthur and Sergei Vassilvitskii. k-means++: The advantages of careful seeding[C]//Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics. Philadelphia, USA, 2007: 1027-1035. [9] D. Sculley. Web Scale K-Means clustering[C]//Proceedings of the 19th international conference on World Wide Web. New York, USA, 2010: 1177-1178. [10] 张小明,李舟军,巢文涵.基于增量型聚类的自动话题检测研究[J]. 软件学报, 2012, 23(6): 1578-1587. [11] Thomas Hofmann. Probabilistic latent semantic indexing[C]//Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. New York, USA, 1999: 50-57. [12] David M. Blei, Andrew Y. Ng, Michael I. Jordan. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research , 2003, 3: 993-1022. [13] 单斌, 李芳. 基于LDA话题演化研究方法综述[J]. 中文信息学报, 2010, 24(6): 43-68. [14] Scott Deerwester, Susan T. Dumais, George W. Furnas, et al. Indexing by latent semantic analysis[J]. Journal of the American society for information science, 1990, 41(6): 391-407. [15] Wei Xu, Xin Liu, and Yihong Gong. Document clustering based on non-negative matrix factorization[C]//Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval. New York, USA, 2003: 267-273. [16] 路荣, 项亮, 刘明荣等. 基于隐主题分析和文本聚类的微博客中新闻话题的发现[J]. 模式识别与人工智能, 2012, 25(3): 382-387. [17] Kanagasabi Rajaraman, Ah-Hwee Tan. Topic Detect ion, Tracking, and Trend Analysis Using Self-Organizing Neural Networks[C]//Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining. London, UK, 2001: 102-107. [18] Xia Hu, Jiliang Tang, Huan Liu. Leveraging knowle dge across media for spammer detection in microblogging[C]//Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. New York, USA, 2014: 547-556. [19] X.-H. Phan, L.-M. Nguyen, S. Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data collections[C]//Proceeding of the 17th WWW. Beijing, China, 2008: 91- 100. [20] Adrien Guille, Hakim Hacid, Cecile Favre, et al. Information diffusion in online social networks: A survey[J]. ACM SIGMOD Record, 2013, 42(2): 31-36. [21] Jon Kleinberg. Bursty and Hierarchical Structure in Streams[C]//Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. Edmonton, Canada, 2002: 91- 101. [22] Jure Leskovec, Lars Backstrom, Jon Kleinberg. Meme-tracking and the dynamics of the news cycle[C]// Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. Paris, France, 2009: 497-506. [23] Ruchi Parikh, Kamalakar Karlapalem. ET: events from tweets[C]//Proceedings of the 22nd international conference on World Wide Web. Republic and Canton of Geneva, Switzerland, 2013: 613-620. [24] Michael Mathioudakis, Nick Koudas. TwitterMonitor: Trend Detection over the Twitter Stream[C]//Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. New York, USA, 2010: 1155-1158. [25] David A Shamma Lyndon Kennedy, Elizabeth F Churchil. Peaks and persistence: modeling the shape of microblog conversation[C]//Proceedings of the ACM 2011 conference on Computer supported cooperative work. New York, NY, USA, 2011: 355-358. [26] Mario Cataldi, Luigi Di Caro, Claudio Schifanella. Emerging Topic Detection on twitter based on temporal and social terms evaluation[C]//Proceedings of the Tenth International Workshop on Multimedia Data Mining. New York, USA, 2010: 4-13. [27] G Salton, C Buckley. Term-weighting approaches in automatic text retrieval[J]. Information Processing and Management, 1988: 513-523. [28] Matthew D Hoffman, David M Blei, Francis R Bach. Online Learning for Latent Dirichlet Allocation[C]// Proceedings of NIPS Vancouver, Canada, 2010: 856-864. [29] Rishabh Mehrotra, Scott Sanner, Wray Buntine, et al. Improving LDA topic models for microblogs via tweet pooling and automatic labeling[C]//Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. New York, USA, 2013: 889-892. [30] Jianshu Weng, Bu-Sung Li. Event Detection in Twitter[C]//Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media. Barcelona, Spain, 2011: 401-408. [31] Chenliang Li, Aixin Sun, Anwitaman Datta. Twevent: segment-based event detection from tweets[C]//Proceedings of the 21st ACM international conference on Information and knowledge management. New York, USA, 2012: 155-164. [32] 贺敏, 王丽宏, 杜攀等. 基于有意义串聚类的微博热点话题发现方法[J]. 通信学报, 2013, (Z1): 256-262. [33] Toshimitsu Takahashi, Ryota Tomioka, Kenji Yamanishi. Discovering Emerging Topics in Social Streams via Link Anomaly Detection[C]//Proceedings of the 2011 IEEE 11th International Conference on Data Mining. Washington, DC, USA, 2011: 1230-1235. [34] Adrien Guille, Cécile Favre. Mention-anomaly-based Event Detection and Tracking in Twitter[C]//Proceedings of the IEEE/ACM International Conference on Advances in Social Network Analysis and Mining. Beijing, China, 2014. [35] Yan Chen, Hadi Amiri, Zhoujun Li, et al. Emerging Topic Detection for Organization from Microblogs[C]// Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. New York, USA, 2013: 43-52. [36] Rong Lu, Qing Yang. Trend Analysis of News Topics on Twitter[J]. International Journal of Machine Learning and Computing, 2012, 2(3): 327-332. [37] Jacob Goldenberg Barak Libai, Eitan Muller. Talk of the network: A complex systems look at the underlying process of word-of-mouth[J]. Marketing Letters, 2001, 12(3): 211-223. [38] David Kempe, Jon Kleinberg, éva Tardos. Maximizing the spread of influence through a social network[C]// Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. New York, USA, 2003: 137-146. [39] Kazumi Saito, Masahiro Kimura, Kouzou Ohara, et al. Learning Continuous-Time Information Diffusion Model for Social Behavioral Data Analysis[C]//Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning. Berlin, Heidelberg, 2009: 322-337. [40] Kazumi Saito, Masahiro Kimura, Kouzou Ohara, et al. Selecting information diffusion models over social networks for behavioral analysis[C]//Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases. Berlin, Heidelberg, 2010. [41] 王巍, 李锐光, 周渊等. 基于用户与节点规模的微博突发话题传播预测算法[J]. 通信学报, 2013, (Z1): 84-91. [42] Jure Leskovec, Andreas Krause, Carlos Guestrin, et al. Cost-effective Outbreak Detection in Networks[C]// Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, USA, 2007: 420-429. [43] 赫南, 李德毅, 淦文燕等. 复杂网络中重要性节点发掘综述[J]. 计算机科学, 2007, 34 (12): 1-5. [44] 孙睿, 罗万伯. 网络舆论中节点重要性评估方法综述[J]. 计算机应用研究, 2012, 29(10): 3606-3608. [45] 刘建国, 任卓明, 郭强等. 复杂网络中节点重要性排序的研究进展[J]. 物理学报, 2013, 62(17): 178901. [46] 赵之滢, 于海, 朱志良等. 基于网络社团结构的节点传播影响力分析[J]. 计算机学报, 2014, 37(4): 753-766. [47] 汪小帆, 李翔, 陈关荣. 网络科学导论[M]. 北京: 高等教育出版社, 2012. [48] Maksim Kitsak, Lazaros K. Gallos, Shlomo Havlin, et al. Identifying influential spreaders in complex networks[J]. Nature Physics, 2010, 6(11): 888-893. [49] Antonios Garas, Frank Schweitzer, Shlomo Havlin. A k-shell decomposition method for weighted networks[J]. New Journal of Physics, 2012, 14(8): 083030. [50] Haewoon Kwak, Changhyun Lee, Hosung Park, et al. What is Twitter, a Social Network or a News Media?[C]// Proceedings of the 19th international conference on World Wide Web. New York, USA, 2010: 591-600. [51] J. Yang, J. Leskovec. Patterns of temporal variation in online media[C]//Proceedings of the fourth ACM international conference on web search and data mining. New York, USA, 2011: 177-186. [52] Ronald Fagin, Ravi Kumar, D. Sivakumar. Comparing top k lists[C]//Proceedings of the fourteenth annual ACM-SIAM symposium on discrete algorithms. Philadelphia, USA, 2003: 28-36. [53] Jianshu Weng, Ee-Peng Lim, Jing Jiang et al. TwitterRank: Finding Topic-sensitive Influential Twitterers[C]// Proceedings of the third ACM international conference on Web search and data mining. New York, USA, 2010: 261-270. [54] Sergey Brin, Lawrence Page. The anatomy of a large-scale hypertextual Web search engine[C]//Proceedings of the seventh international conference on World Wide Web. Amsterdam, The Netherlands, 2013: 107-117. [55] Liyuan Lü, Yi-cheng Zhang, Chi Ho Yeung. et al. Leaders in Social Networks, the Delicious Case[J]. PLoS One, 2011, 6: e21202. [56] Jie Tang, Jimeng Sun, Chi Wang, et al. Social Influence Analysis in Large-scale Networks[C]//Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, USA, 2009: 807-816. [57] Jie Tang, Sen Wu, Jimeng Sun. Confluence: Conformity Influence in Large Social Networks[C]// Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, USA, 2013: 347-355. [58] Lu Liu, Jie Tang, Jiawei Han, et al. Learning Influence from Heterogeneous Social Networks[J]. Data Mining and Knowledge Discovery, 2012, 25(3): 511-544. [59] Jingwen Bian, Yang Yang, Tat-Seng Chua. Predicting Trending Message and Diffusion Participants in Microblogging Network[C]//Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. New York, USA, 2014: 537-546. [60] M. Farajtabar, M. Gomez-Rodriguez, Y. Wang, et al. Co-evolutionary Dynamics of Information Diffusion and Network Structure[C]//Proceedings of the 24th International Conference on World Wide Web Companion. Republic and Canton of Geneva, Switzerland, 2015: 619-620. [61] S. A. Myers, J. Leskovec. The Bursty Dynamics of the Twitter Information Network[C]//Proceedings of the 23rd international conference on World Wide Web. New York, USA, 2014: 913-924. [62] S. A. Myers, J. Leskovec. Clash of the contagions: Cooperation and competition in information diffusion[C]// Proceedings of the 12th International Conference on Data Mining. Brussels, Belgium, 2012: 539-548.