基于依存句法分析的社会媒体文本挖掘方法——以饮食习惯特色分析为例

任彬,车万翔,刘挺

PDF(4901 KB)
PDF(4901 KB)
中文信息学报 ›› 2014, Vol. 28 ›› Issue (6) : 208-215.
信息抽取与文本挖掘

基于依存句法分析的社会媒体文本挖掘方法——以饮食习惯特色分析为例

  • 任彬,车万翔,刘挺
作者信息 +

Dependency Parsing-Based Social Media Text Mining——a Case Study in Analysis of Weibo Users Eating Habits

  • REN Bin,CHE Wanxiang,LIU Ting
Author information +
History +

摘要

在进行社会媒体文本挖掘时,传统的基于词表的方法,存在准确率较低、词表难获得等问题。该文提出一种基于依存句法分析的文本挖掘方法,通过规则匹配的方式从社会媒体文本中提取信息。该方法不依赖词表,且实验证明了相比基于词表的方法在准确率上有大幅提高。应用基于依存句法分析的文本挖掘方法,我们在微博文本上进行了饮食习惯特色分析,实现了性别、地区、时间等维度的饮食习惯特色分析并可进行交叉分析,最终用词云的方式展示了结果。

Abstract

For social media text mining, the traditional lexicon method has the problem of lower accuracy and difficulty in lexicon acquisition. This paper proposes a dependency parsing-based text mining method to acquire information from social media text using matching rules. This method can work without lexicons and the experiment results prove a substantial increase in accuracy compared to the lexicon method. Using the dependency parsing-based method, we conducted an eating habits analysis on the Weibo text and achieve results on gender, region, time, including cross-analysis results, which are presented by word clouds.

关键词

依存句法分析 / 文本挖掘 / 社会媒体 / 饮食习惯特色分析

Key words

dependency parsing / text mining / social media / eating habits analysis

引用本文

导出引用
任彬,车万翔,刘挺. 基于依存句法分析的社会媒体文本挖掘方法——以饮食习惯特色分析为例. 中文信息学报. 2014, 28(6): 208-215
REN Bin,CHE Wanxiang,LIU Ting. Dependency Parsing-Based Social Media Text Mining——a Case Study in Analysis of Weibo Users Eating Habits. Journal of Chinese Information Processing. 2014, 28(6): 208-215

参考文献

[1] Miller G. Social scientists wade into the tweet stream[J]. Science, 2011, 333(6051): 1814-1815.
[2] Lazer D, Pentland A S, Adamic L, et al. Life in the network: the coming age of computational social science[J]. Science (New York, NY), 2009, 323(5915): 721.
[3] Schwartz H A, Eichstaedt J C, Kern M L, et al. Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach[J]. PloS one, 2013, 8(9): e73791.
[4] Asur S, Huberman B A. Predicting the future with social media[C]//Proceedings of Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on. IEEE, 2010, 1: 492-499.
[5] P Pennebaker J W, Francis M E, Booth R J. Linguistic inquiry and word count: LIWC 2001[J]. Mahway: Lawrence Erlbaum Associates, 2001, 71: 2001.
[6] Pennebaker J W, Chung C K, Ireland M, et al. The development and psychometric properties of LIWC2007[OL]www.liwc.net.
[7] Tausczik Y R, Pennebaker J W. The psychological meaning of words: LIWC and computerized text analysis methods[J]. Journal of Language and Social Psychology, 2010, 29(1): 24-54.
[8] 李正华. 依存句法分析统计模型及树库转化研究[D]. 哈尔滨工业大学硕士学位论文,2008.
[9] Che W, Li Z, Liu T. Ltp: A chinese language technology platform[C]//Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations. Association for Computational Linguistics, 2010: 13-16.
[10] Golder S A, Macy M W. Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures[J]. Science, 2011, 333(6051): 1878-1881.
[11] Dodds P S, Harris K D, Kloumann I M, et al. Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter[J]. PloS one, 2011, 6(12): e26752.
[12] Hannak A, Anderson E, Barrett L F, et al. Tweetinin the Rain: Exploring Societal-Scale Effects of Weather on Mood[C]//Proceedings of ICWSM. 2012.
[13] Fleiss J L. Measuring nominal scale agreement among many raters[J]. Psychological bulletin, 1971, 76(5): 378.
[14] Liu Y, Zhang M, Che W, et al. Micro blogs Oriented Word Segmentation System[J]. CLP 2012, 2012: 85.
[15] Schwartz H A, Eichstaedt J, Dziurzynski L, et al. Choosing the Right Words: Characterizing and Reducing Error of the Word Count Approach[C]//Proceedings of SEM-2013,2013:296-305.

基金

国家重点基础研究发展计划(973计划)(2014CB340503);国家自然科学基金面上项目(61370164);国家自然科学基金重点项目(61133012)
PDF(4901 KB)

Accesses

Citation

Detail

段落导航
相关文章

/