微博及中文微博信息处理研究综述

文坤梅, 徐 帅, 李瑞轩, 辜希武, 李玉华

PDF(1727 KB)
PDF(1727 KB)
中文信息学报 ›› 2012, Vol. 26 ›› Issue (6) : 27-38.
综述

微博及中文微博信息处理研究综述

  • 文坤梅, 徐 帅, 李瑞轩, 辜希武, 李玉华
作者信息 +

Survey of Microblog and Chinese Microblog Information Processing

  • WEN Kunmei, XU Shuai, LI Ruixuan, GU Xiwu, LI Yuhua
Author information +
History +

摘要

微博即微博客,是Web2.0时代下衍生出的一种新型社会网络,其简单快捷的操作方式和随时随地发布信息的互动形式成为互联网的一大亮点。自2006年美国Obvious公司推出全球首个微博服务Twitter后,微博以惊人的发展速度受到国内外研究人员的广泛关注。该文首先对以Twitter为代表的微博其研究现状进行综述,主要包括(1)微博社会网络的特性分析,如微博用户网络的结构特征、微博用户的影响力分析及消息网络的信息传播机制等;(2)微博内容的语义分析,对微博中的情感语义分析进行了重点阐述;(3)微博的相关应用,包括微博在事件监测与预警、安全隐私及实时检索中的应用。然后概述了中文微博的研究现状,包括中文微博的特性及知识发现,分析了中文微博与英文微博的主要区别。最后讨论目前微博研究中存在的问题及未来中文微博的研究方向。

Abstract

Microblog is a new social network developed in the Web2.0 era, with the simple and quick operation for a post anytime and anywhere through the interaction form. These features make Microblog boom with a highlight in the Internet since 2006, when the Obvious company of the United States launched the worlds first Microblog service named Twitter. This paper firstly introduces the state-of-art research on Twitter, including 1) feature analysis on Microblog social network, e.g. the structure of Microblog users network, the Microblog users impact analysis and the data diffusion mechanics in the information network; 2) semantic analysis, i.e. emotional semantic analysis on Microblog; 3) related applications in Microblog, e.g. event monitoring and warning, security, privacy and real time search. Then we summarize the research on Chinese Microbolg, including the feature and knowledge discovery of Chinese Microblog, and the differences between English and Chinese Microblog. Finally, we discuss the problems in the future research on Chinese Microblog.
Key wordsTwitter; Chinese microblog; information process

关键词

Twitter / 中文微博 / 信息处理

Key words

Twitter / Chinese microblog / information process

引用本文

导出引用
文坤梅, 徐 帅, 李瑞轩, 辜希武, 李玉华. 微博及中文微博信息处理研究综述. 中文信息学报. 2012, 26(6): 27-38
WEN Kunmei, XU Shuai, LI Ruixuan, GU Xiwu, LI Yuhua. Survey of Microblog and Chinese Microblog Information Processing. Journal of Chinese Information Processing. 2012, 26(6): 27-38

参考文献

[1] A. Java, X. Song, T. Finin, et al. Why we twitter: understanding microblogging usage and communities.[C]//Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, 2007: 56-65.
[2] H. Kwak, C. Lee, H. Park, et al. What is Twitter, a social network or a news media[C]//Proceedings of the International Conference on Word Wide Web (WWW), 2010: 591-600.
[3] S. Wu, J. M. Hofman, W. A. Mason, et al. Who says what to whom on Twitter[C]//Proceedings of the International Conference on World Wide Web (WWW), 2011: 705-714.
[4] M. Gupte, P. Shankar, J. Li, et al. Finding hierarchy in directed online social networks[C]//Proceedings of the International Conference on World Wide Web (WWW), 2011: 557-566.
[5] A. Arasu, J. Cho, H. Garcia-Molina, et al. Searching the web [J]. ACM Transactions on Internet Technology, 2001, 1(1): 2-43.
[6] J. Weng, E. Lim, J. Jiang, et al. TwitterRank: finding topic-sensitive influential twitterers[C]//Proceedings of the ACM Conference on Web Search and Data Mining (WSDM), 2010: 261-270.
[7] M. Cha, H. Haddadi, F. Benevenuto, K. P. Gummad. Measuring user influence on twitter: the million follower fallacy[C]//Proceedings of the 4th International AAAI Conference on Weblogs and Social Media, 2010.
[8] E. Bakshy, J. M. Hofman, W. A. Mason, et al. Everyones an influencer: quantifying influence on Twitter[C]//Proceedings of the ACM Conference on Web Search and Data Mining (WSDM), 2011: 65-74.
[9] B. Krishnamurthy, P. Gill, M. Arlitt. A few chirps about twitter[C]//Proceedings the 1st Workshop on Online Social Networks, 2008: 19-24.
[10] D. Zhao, M. B. Rosson. How and why people twitter: the role that micro-blogging plays in informal communication at work[C]//Proceedings of the International Conference on Supporting Group Work, 2009: 243-252.
[11] Aditya Pal, Scott Counts. Identifying topical authorities in microblogs[C]//Proceedings of the ACM Conference on Web Search and Data Mining (WSDM), 2011: 45-54.
[12] M. Welch, U. Schonfeld, D. He, et al. Topical semantics of Twitter links[C]//Proceedings of the ACM Conference on Web Search and Data Mining (WSDM), 2011: 327-336.
[13] J. Yang, S. Counts. Comparing information diffusion structure in weblogs and microblogs[C]//Proceedings of the International AAAI Conference on Weblogs and Social Media (ICWSM), 2010.
[14] J. Yang, S. Counts. Predicting the speed, scale, and range of information diffusion in Twitter[C]//Proceedings of the International AAAI Conference on Weblogs and Social Media (ICWSM), 2010.
[15] S. Petrovic, M. Osborne, V. Lavrenko. RT to win! predicting message propagation in Twitter[C]//Proceedings of the International AAAI Conference on Weblogs and Social Media (ICWSM), 2011.
[16] J. Leskovec. Social media analytics: Tracking, modeling and predicting the flow of information through networks[C]//Proceedings of the International Conference on World Wide Web (WWW), 2011: 277-278.
[17] D. Romero, B. Meeder, J. Kleinberg. Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on Twitter[C]//Proceedings of the International Conference on World Wide Web (WWW), 2011: 695-704.
[18] S. Sadikov, M. Medina, J. Leskovec, et al. Correcting for missing data in information cascades[C]//Proceedings of the ACM Conference on Web Search and Data Mining (WSDM), 2011: 55-64.
[19] 杨亮, 林原, 林鸿飞. 基于情感分布的微博热点事件发现 [J]. 中文信息学报, 2012, 26(1): 84-90, 109.
[20] 靳延安, 李瑞轩, 文坤梅, 等. 社会标注及其在信息检索中的应用研究综述 [J]. 中文信息学报, 2010, 24(4): 52-62.
[21] W. Wu, B. Zhang, M. Ostendorf. Automatic Generation of Personalized Annotation Tags for Twitter Users[C]//Proceedings of the Annual Conference of the North American Chapter of Association for Computational Linguistics (ACL), 2010: 689-692.
[22] Mihalcea, P. Tarau. TextRank: bringing order into texts[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2004: 404-411.
[23] X. Zhao, J. Jiang, J. He, et al. Topical keyphrase extraction from Twitter[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL), 2011: 379-388.
[24] W. Zhao, J. Jiang, J. Weng. Comparing Twitter and traditional media using topic models[C]//Proceedings of the European Conference on Information Retrieval (ECIR), 2011: 338-349.
[25] L. Hong, O. Dan, B. D. Davison. Predicting popular messages in twitter[C]//Proceedings of the International Conference on World Wide Web (WWW), 2011: 57-58.
[26] C. Castillo, M. Mendoza, B. Poblete. Information credibility on twitter[C]//Proceedings of the International Conference on World Wide Web (WWW), 2011: 675-684.
[27] 曹鹏, 李静远,满彤, 等. Twitter 中近似重复消息的判定方法研究 [J]. 中文信息学报, 2011, 25(1): 20-27.
[28] M. Hu, B. Liu. Mining and summarizing customer reviews[C]//Proceedings of the Annual Conference of the ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD), 2004: 168-177.
[29] N. Kaji, M. Kitsuregawa. Automatic construction of polarity-tagged corpus from HTML documents[C]//Proceedings of the Joint Conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics (COLING/ ACL), 2006: 452-459.
[30] L. Zhuang, F. Jing, X. Zhu, et al. Movie review mining and summarization[C]//Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM), 2006: 43-50.
[31] A. Andreevskaia, S. Bergler. Mining WordNet for fuzzy sentiment: sentiment tag extraction from WordNet glosses[C]//Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2006: 209-216.
[32] G. A. Miller. WordNet: a lexical database for English [J]. ACM Transactions on Communication, 1995, 38(11): 39-41.
[33] X. Ding, B. Liu, P. Yu. A holistic lexicon-based approach to opinion mining[C]//Proceedings of the ACM Conference on Web Search and Data Mining (WSDM), 2008: 231-240.
[34] 朱嫣岚,闵锦,周雅倩,等. 基于HowNet的词汇语义倾向计算 [J]. 中文信息学报, 2006, 1(20): 14-20.
[35] 章剑锋,张奇,吴立德,等. 中文观点挖掘中的主观性关系抽取 [J]. 中文信息学报, 2008, 22(2): 55-59,86.
[36] 杜伟夫,谭松波,云晓春. 一种新的情感词汇语义倾向计算方法 [J]. 计算机研究与发展, 2009, 46(10): 1713-1720.
[37] 刘群, 李素建. 基于《知网》的词汇语义相似度的计算[C]//第三届汉语词汇语义学研讨会, 2002.
[38] 廖祥文,曹冬林,方滨兴,等. 基于概率推理模型的博客倾向性检索研究 [J]. 计算机研究与发展, 2009, 46(9): 1530-1536.
[39] A. Bermingham, A. Smeaton. Classifying sentiment in microblogs: is brevity an advantage?[C]//Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM), 2010: 1833-1836.
[40] A. Go, L. Huang, R. Bhayani. Twitter sentiment analysis [R]. Final Projects from CS224N for Spring 2008/2009 at The Stanford Natural Language Processing Group.
[41] E. Kim, S. Gilbert, M. Edwards, et al. Detecting sadness in 140 characters: sentiment analysis of mourning Michael Jackson on Twitter [R]. Web Ecology Project, Boston, MA, 2009.
[42] B. J. Jansen, M. Zhang, K. Sobel, et al. Micro-blogging as online word of mouth branding[C]//Proceedings of the 27th International Conference Extended Abstracts on Human Factors in Computing Systems, 2009: 3859-3864.
[43] J. Bollen, H. Mao, X. Zeng. Twitter mood predicts the stock market [J]. Journal of Computational Science, 2011, 2(1): 1-8.
[44] A. Tumasjan, T. O. Sprenger, P. G. Sandner, et al. Predicting elections with Twitter: what 140 characters reveal about political sentiment[C]//Proceedings of the International AAAI Conference on Weblogs and Social Media (ICWSM), 2010.
[45] T. Sakaki, M. Okazaki, Y. Matsuo. Earthquake shakes Twitter users: real-time event detection by social sensors[C]//Proceedings of the 19th International World Wide Web Conference (WWW), 2010: 851-860.
[46] V. K. Singh, M. Gao, R. Jain. Situation detection and control using spatio-temporal analysis of microblogs[C]//Proceedings of the 19th International World Wide Web Conference (WWW), 2010: 1181-1182.
[47] C. Zhang, J. Sun, X. Zhu, et al. Privacy and security for online social networks: challenges and opportunities [J]. IEEE Network, 2010, 24(4): 13-18
[48] J. Sun, X. Zhu, Y. Fang. A privacy-preserving scheme for online social networks with efficient revocation[C]//Proceedings of the 29th IEEE International Conference on Computer Communications (INFOCOM), 2010: 1-9.
[49] J. Teevan, D. Ramage, M. Morris. Twittersearch: A comparison of microblog search and web search[C]//Proceedings of the ACM Conference on Web Search and Data Mining (WSDM), 2011: 35-44.
[50] K. Borau, C. Ullrich, J. Feng, et al. Microblogging for language learning: using twitter to train communicative and cultural competence[C]//Proceedings of International Conference on Web-based Learning (ICWL), 2009: 78-87.
[51] M. Ebner, M. Schiefner. In microblogging more than fun?[C]//Proceedings of IADIS International Conference on Mobile Learning, 2008: 155-159.
[52] B. Sriram, D. Fuhry, E. Demir, et al. Short text classification in Twitter to improve information filter-ing[C]//Proceedings of the 33rd Annual Conference of the ACM Special Interest Group on Information Retrieval (SIGIR), 2010: 841-842.
[53] J. Pujol, V. Erramilli, G. Siganos, et al. The little engine(s) that could: scaling online social networks[C]//Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM), 2010: 375-386.
[54] Y. Duan, L. Jiang, T. Qin, et al. An empirical study on learning to rank of tweets[C]//Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 2010: 295-303.
[55] A.D. Sarma, S. Gollapudi, R. Panigrahy. Ranking Mechanisms in Twitter-Like Forums[C]//Proceedings of the ACM Conference on Web Search and Data Mining (WSDM), 2010: 21-30.
[56] J. Huang, K. M. Thornton, E. N. Efthimiadis. Conversational tagging in Twitter[C]//Proceedings of the 21st ACM Conference on Hypertext and Hypermedia, 2010: 173-178.
[57] L. Yu, S. Asur, B. A. Huberman. What trends in Chinese social media[C]//Proceedings of the ACM SIGKDD Workshop on Social Network Mining and Analysis (SNA-KDD), 2011.
[58] 谢丽星, 周明, 孙茂松. 基于层次结构的多策略中文微博情感分析和特征抽取 [J]. 中文信息学报, 2012, 26(1): 73-83.

基金

国家自然科学基金资助项目(61173170, 60873225, 70771043);国家高技术研究发展计划(863计划)资助项目(2007AA01Z403);湖北省自然科学基金资助项目(2009CDB298);中央高校基本科研业务费专项资金资助项目(华中科技大学自主创新研究基金2011TS135, 2010MS068);CCF中文信息技术开放基金
PDF(1727 KB)

Accesses

Citation

Detail

段落导航
相关文章

/