社交媒体的发展为抑郁用户的检测提供了一条新的途径。已有的相关研究通常是利用用户在Twitter、微博等社交网络平台上的用户行为数据或公开发表的文本内容,较少有利用微信朋友圈、QQ空间这种相对比较私密的社交网络数据。直观地,这类准私密社交网络数据更能反映用户的心理健康状况。该文主要讨论利用准私密社交网络文本数据检测抑郁用户的可行性,包括训练样本的选择、特征量化方法、检测模型选择和不同文本特征下的模型分类效果等。实验表明,采用平衡高低分组的方法选择样本比非平衡高低分组样本和离散化的高低分组样本训练的分类器要好;利用Z-score标准化的特征量化方法比直接使用频次或归一化频率要好;随机梯度下降模型SGD较支持向量机SVM等其他用于对比的分类模型要好。实验还发现,相对于词袋、词向量等文本特征,主题特征有较好的效果,可以使社交网络用户抑郁检测模型的F值达到0.753,而对抑郁用户的检测精度达到0.813。
Abstract
The development of social network has provided an innovative perspective for detecting depressive users. Few works have been done on private data which come from the relatively private social network such as WeChat friends circle or QQ Zone to detect depressive users. This paper discusses the feasibility of detecting depressive users on quasi-private social network data,including training samples,feature extraction,detection model,etc. The experimental results show that,to train an effective model and overcome the challenge of unbalance samples,we should firstly select almost the same amount of positive and negative samples with the highest and the lowest scores of self-report tests,which corresponding to the most depressive users and the most normal users. Secondly,the features should be quantified by Z-score standardized frequency,which is more powerful than the other two quantifying methods such as frequency or normalized frequency. Thirdly,the SGD classifier performs better than the other classifiers such as SVM. The results also show that,compared to other features such as bag-of-words or word-to-vector,topical features performs better with 0.813 detection precision and 0.753 F-measure.
关键词
准私密社交网络文本 /
抑郁用户检测 /
可行性分析
{{custom_keyword}} /
Key words
quasi-private social text /
depressive users detecting /
feasibility analysis
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Xiaohui Liang,Siqi Su,Jiayuan Deng,et al. Investigation of college students’ mental health status via semantic analysis of Sina microblog[J].Wuhan University Journal of Natural Sciences,2015,20(2):159-164.
[2] Tsugawa Shuo,Mogi Yukiko,Kikuchi Yusuke,et al. On estimating depressive tendencies of Twitter users utilizing their tweet data[C]//Proceedings of the Virtual Reality Conference,2013:1-4.
[3] Munmun De Choudhury,Michael Gamon,Scott Counts, et al. Predicting depression via social media[C]//Proceedings of the Association for the Advancement of Artificial Intelligence,2013:1-10.
[4] Munmun De Choudhury,Scott Counts,Eric Horvitz. Social media as a measurement tool of depression in populations[C]//Proceedings of the ACM Web Science Conference,2013:47-56.
[5] Minsu Park,Chiyoung Cha,Meeyoung Cha. Depressive moods of users portrayed in Twitter[C]//Proceedings of the HI-KDD-12,2012:978-985.
[6] Xinyu Wang,Chunhong Zhang,Yang Ji,et al. A depression detection model based on sentiment analysis in Micro-blog social network[C]//Proceedings of the Revised Selected Papers of PAKDD 2013 International Workshops on Trends and Applications in Knowledge Discovery and Data Mining,2013:201-213.
[7] Xinyu Wang,Chunhong Zhang,Li Sun. An improved model for depression detection in Micro-blog social network[C]//Proceedings of the IEEE 13th International. Conference on Data Mining Workshops,2013:80-87.
[8] 李昂,郝碧波,白朔天,等. 基于网络数据分析的心理计算:针对心理健康状态与主观幸福感[J].科学通报,2015(11):994-1001.
[9] Youn Soo Jeong,Trinh Nhi-Ha,Shyu Irene,et al. Using online social media,Facebook,in screening for major depressive disorder among college students[J].International Journal of Clinical and Health Psychology,2013,13(1):74-80.
[10] Megan A Moreno,Lauren A Jelenchick,Katie G Egan,et al. Feeling bad on Facebook:Depression disclosures by college students on a social networking site[J].Depression and Anxiety,2011,28(6):447.
[11] Megan A Moreno,Lauren A Jelenchick,Rajitha Kota. Exploring depression symptom references on Facebook among college freshmen:A mixed methods approach[J].Open Journal of Depression,2013,2(3):35-41.
[12] Wei Tong Mok,Rachael Sing,Xiuting Jiang,et al. Investigation of social media on depression[C]//Proceedings of the International Symposium on Chinese Spoken Language Processing,2014:488-491.
[13] Minsu Park,David W Mcdonald,Meeyoung Cha. Perception differences between the depressed and non-depressed users in Twitter[C]//Proceeding of the Seventh International AAAI Conference on Weblogs and Social media,2013:476-485.
[14] Munnmun De Choudhury,Scott Counts,Eric Horvitz. Predicting postpartum changes in emotion and behavior via social media[C]//Proceeding of the Sigchi Conference on Human Factors in Computing Systems. ACM,2013:3267-3276.
[15] Munnmun De Choudhury,Scott Counts,Michael Gamon. Not all moods are created equal! Exploring human emotional states in social media[C]//Proceedings of the 6th International AAAI Conference on Weblogs and Social Media,2012:66-73.
[16] Thin Nguyen,Dinh Phung,Bo Dao,et al. Affective and content analysis of online depression communities[J].IEEE Transactions on Affective Computing,2014,5(3):217-226.
[17] Bo Dao,Thin Nguyen,Dinh Phung,et al. Effect of mood,social connectivity and age in online depression community via topic and linguistic Analysis[C]//Proceeding of the 15th Web Information Systems Engineering,2014:398-407.
[18] Shuotian Bai,Bibo Hao,Ang Li,et al. Depression and anxiety prediction on Microblogs[J].Molecular Microbiology,2014,5(8):814-820.
[19] Sungkyu Park,Sang won Lee,Jinah Kwak,et al. Activities on Facebook reveal the depressive state of users[J].Journal of Medical Internet Research,2013,15(10):217.
[20] Shuotian Bai,Tingshao Zhu,Cheng Li. Big-five personality prediction based on user behaviors at social network sites[J].Computer Science,2012,8(2):2682-2682.
[21] Changye Zhu,Baobin Li,Aang Li,et al. Predicting depression from Internet behaviors by time-frequency features[C]//Proceedings of the Ieee/wic/acm International Conference on Web Intelligence,2017:383-390.
[22] Misato Hiraga. Predicting depression for Japanese blog Text[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics,Student Research Workshop,2017:107-113.
[23] 钟毓,费定舟. 基于稀疏主成分分析的非正式语词的心理-人格特征研究[J].中文信息学报,2017,31(1):192-204.
[24] Guan Li,Bibo Hao,Qijin Cheng,et al. Identifying Chinese microblog users with high suicide probability using internet-based profile and linguistic features:classification model[J].Jmir Mental Health,2015,2(2):17.
[25] Lenore Sawyer Radloff. The CES-D Scale:A self-report depression scale for research in the general population[J].Applied Psychological Measurement,1977,1(3):385-401.
[26] Aaron T Beck,Robert A Steer,Margery G Carbin. Psychometric properties of the Beck Depression Inventory:Twenty-five years of evaluation[J].Clinical Psychology Review,1988,8(1):77-100.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61762042,61363039,61562032,61662027,61462037);江西省科技落地计划(KJLD14035);江西省自然科学基金(20171BAB202021)
{{custom_fund}}