任彬,车万翔,刘挺. 基于依存句法分析的社会媒体文本挖掘方法——以饮食习惯特色分析为例[J]. 中文信息学报, 2014, 28(6): 208-215.
REN Bin,CHE Wanxiang,LIU Ting. Dependency Parsing-Based Social Media Text Mining——a Case Study in Analysis of Weibo Users Eating Habits. , 2014, 28(6): 208-215.
基于依存句法分析的社会媒体文本挖掘方法——以饮食习惯特色分析为例
任彬,车万翔,刘挺
哈尔滨工业大学 社会计算与信息检索研究中心,黑龙江 哈尔滨 150001
Dependency Parsing-Based Social Media Text Mining——a Case Study in Analysis of Weibo Users Eating Habits
REN Bin,CHE Wanxiang,LIU Ting
Research Center for Social Computing and Information Retrieval, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001
Abstract:For social media text mining, the traditional lexicon method has the problem of lower accuracy and difficulty in lexicon acquisition. This paper proposes a dependency parsing-based text mining method to acquire information from social media text using matching rules. This method can work without lexicons and the experiment results prove a substantial increase in accuracy compared to the lexicon method. Using the dependency parsing-based method, we conducted an eating habits analysis on the Weibo text and achieve results on gender, region, time, including cross-analysis results, which are presented by word clouds.
[1] Miller G. Social scientists wade into the tweet stream[J]. Science, 2011, 333(6051): 1814-1815. [2] Lazer D, Pentland A S, Adamic L, et al. Life in the network: the coming age of computational social science[J]. Science (New York, NY), 2009, 323(5915): 721. [3] Schwartz H A, Eichstaedt J C, Kern M L, et al. Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach[J]. PloS one, 2013, 8(9): e73791. [4] Asur S, Huberman B A. Predicting the future with social media[C]//Proceedings of Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on. IEEE, 2010, 1: 492-499. [5] P Pennebaker J W, Francis M E, Booth R J. Linguistic inquiry and word count: LIWC 2001[J]. Mahway: Lawrence Erlbaum Associates, 2001, 71: 2001. [6] Pennebaker J W, Chung C K, Ireland M, et al. The development and psychometric properties of LIWC2007[OL]www.liwc.net. [7] Tausczik Y R, Pennebaker J W. The psychological meaning of words: LIWC and computerized text analysis methods[J]. Journal of Language and Social Psychology, 2010, 29(1): 24-54. [8] 李正华. 依存句法分析统计模型及树库转化研究[D]. 哈尔滨工业大学硕士学位论文,2008. [9] Che W, Li Z, Liu T. Ltp: A chinese language technology platform[C]//Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations. Association for Computational Linguistics, 2010: 13-16. [10] Golder S A, Macy M W. Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures[J]. Science, 2011, 333(6051): 1878-1881. [11] Dodds P S, Harris K D, Kloumann I M, et al. Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter[J]. PloS one, 2011, 6(12): e26752. [12] Hannak A, Anderson E, Barrett L F, et al. Tweetinin the Rain: Exploring Societal-Scale Effects of Weather on Mood[C]//Proceedings of ICWSM. 2012. [13] Fleiss J L. Measuring nominal scale agreement among many raters[J]. Psychological bulletin, 1971, 76(5): 378. [14] Liu Y, Zhang M, Che W, et al. Micro blogs Oriented Word Segmentation System[J]. CLP 2012, 2012: 85. [15] Schwartz H A, Eichstaedt J, Dziurzynski L, et al. Choosing the Right Words: Characterizing and Reducing Error of the Word Count Approach[C]//Proceedings of SEM-2013,2013:296-305.