微博环境中的机器人账户检测综述

张玄, 李保滨

PDF(4493 KB)
PDF(4493 KB)
中文信息学报 ›› 2022, Vol. 36 ›› Issue (12) : 1-15.
综述

微博环境中的机器人账户检测综述

  • 张玄,李保滨
作者信息 +

Social Bot Account Detection on Microblog: A Survey

  • ZHANG Xuan, LI Baobin
Author information +
History +

摘要

微博是信息交流的重要平台,其中存在的机器人账户对信息传播和舆论意见形成具有显著影响。研究微博环境中机器人账户的检测方法,在此基础上识别并处理机器人账户和它们发表的有害言论,能够遏制和消除它们带来的不利作用,对网络空间治理具有重要意义。该文系统地梳理了近年来微博环境中、特别是Twitter与Weibo平台中的机器人账户检测研究工作,列举了获取数据和提取特征的常用方法,着重阐述了基于统计方法、传统机器学习方法以及深度学习方法的机器人账户检测模型并评价其性能,分析了机器人账户检测技术目前面临的问题与挑战,展望了未来研究的发展方向。

Abstract

Social bots in microblog platforms significantly impact information dissemination and public opinion stance. This paper reviews the recent researches on social bot account detection in microblogs, especially Twitter and Weibo. The popular methods for data acquisition and feature extraction are reviewed. Various bot detection algorithms are summarized and evaluated, including approaches based on statistical methods, classical machine learning methods, and deep learning methods. Finally, some suggestions for future research are anticipated.

关键词

微博 / 社交机器人 / 机器学习 / 深度学习

Key words

microblog / social bot / machine learning / deep learning

引用本文

导出引用
张玄, 李保滨. 微博环境中的机器人账户检测综述. 中文信息学报. 2022, 36(12): 1-15
ZHANG Xuan, LI Baobin. Social Bot Account Detection on Microblog: A Survey. Journal of Chinese Information Processing. 2022, 36(12): 1-15

参考文献

[1] Mavrodieva,Rachman,Harahap,et al. Role of social media as a soft power tool in raising public awareness and engagement in addressing climate change[J]. Climate,2019,7(10): 122-122.
[2] Cresci S. A decade of social bot detection[J]. Communications of the ACM,2020,63(10): 72-83.
[3] Cresci S,Lillo F,Regoli D,et al. Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on twitter[J]. ACM Transactions on the Web,2018,13(2): 1-27.
[4] Cresci S,Di Pietro R,Petrocchi M,et al. Social fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling[J]. IEEE Transactions on Dependable and Secure Computing,2017,15(4): 561-76.
[5] Shao C,Ciampaglia G L,Varol O,et al. The spread of low-credibility content by social bots[J]. Nature Communications,2018,9(1): 1-9.
[6] Stella M,Ferrara E,De Domenico M. Bots increase exposure to negative and inflammatory content in online social systems[C]//Proceedings of the National Academy of Sciences,2018: 12435-40.
[7] Cheng C,Luo Y,Yu C. Dynamic mechanism of social bots interfering with public opinion in network[J]. Physica A: Statistical Mechanics and its Applications,2020,551: 124163.
[8] Thomas K,Grier C,Song D,et al. Suspended accounts in retrospect: an analysis of twitter spam[C]//Proceedings of the ACM SIGCOMM Conference on Internet Measurement Conference,2011: 243-258.
[9] Twitter. Twitter Announces 3rd Quarter 2021 Results[EB/OL]. https://investor.twitterinc.com[2022-03-27].
[10] Weibo. Weibo Reports Third Quarter 2021[EB/OL]. http://ir.weibo.com[2022-03-27].
[11] Schields B,Levashina J. Comparing the Social Media in the United States and BRIC Nations,and the Challenges Faced in International Selection[C]//Proceedings of Social Media in Employee Selection and Recruitment. Cham,Switzerland: Springer,2016: 157-174.
[12] Weibo . 微博服务使用协议[EB/OL]. https://www.weibo.com/sinup/v5/protocol[2022-03-27].
[13] Twitter. Twitter规则[EB/OL]. https://help.twitter.com/zh-cn/rules-and-policies/twitter-rules[2022-06-10].
[14] Twitter. Automation rules[EB/OL]. https://help.twitter.com[2022-03-27].
[15] 杨慧芸. 隐形操纵与数据污染: 社交媒体中的机器人水军[J]. 新闻知识,2020,1: 3-10.
[16] Cresci S,Di Pietro R,Petrocchi M,et al. Fame for sale: Efficient detection of fake Twitter followers[J]. Decision Support Systems,2015,80: 56-71.
[17] Bessi A,Ferrara E. Social Bots Distort the 2016 US Presidential Election Online Discussion[J]. First Monday,2016,21: 11-7.
[18] 黎明. 警方微博自查 投票悄然逆转[N]. 南方都市报,2011-03-02(DA07).
[19] Powers D M W. Evaluation: From precision,recall and F-measure to ROC,informedness,markedness and correlation[J/OL]. arXiv preprint arXiv: 2010.16061,2020.
[20] Ferrara E,Varol O,Davis C,et al. The rise of social bots[J]. Communications of the ACM,2016,59(7): 96-104.
[21] Twitter. API[EB/OL]. https://developer.twitter.com[2022-03-28].
[22] Lee K,Eoff B,Caverlee J. Seven months with the devils: A long-term study of content polluters on twitter[C]//Proceedings of the International AAAI conference on web and social media,2011: 185-192.
[23] Feng S,Wan H,Wang N,et al. Twibot-20: A comprehensive twitter bot detection benchmark[C]//Proceedings of the 30th ACM International Conference on Information & Knowledge Management,2021: 4485-4494.
[24] Cresci S,Pietro R D,Petrocchi M,et al. Social fingerprinting: Detection of spambot groups through DNA-Inspired behavioral modeling[J]. IEEE Transactions on Dependable and Secure Computing,2017,15(4): 561-576.
[25] Chavoshi N,Hamooni H,Mueen A. DeBot: Twitter bot detection via warped correlation[C]//Proceedings of IEEE International Conference on Data Mining,2016: 817-822.
[26] Yang K C,Varol O,Davis C A,et al. Arming the public with artificial intelligence to counter social bots[J]. Human Behavior and Emerging Technologies,2019,1(1): 48-61.
[27] Yang K C. Bot Repository[EB/OL]. https://botometer.osome.iu.edu/bot-repository[2022-03-28].
[28] Hu Y,Huang H,Chen A,et al. Weibo-COV: A large-scale COVID-19 social media dataset from Weibo[J/OL]. arXiv preprint arXiv: 2005.09174,2020.
[29] 车尚锟. 2013-2017年双十一前后的新浪微博数据[EB/OL]. https://doi.org/10.18170/DVN/EC0G0E[2022-03-27].
[30] 数据堂. 中文社交评论类事件标注数据[EB/OL]. https://www.datatang.com/dataset/info/text/83[2022-03-28].
[31] Varol O,Ferrara E,Davis C A,et al. Online human-bot interactions: Detection,estimation,and characterization[C]//Proceedings of the International AAAI Conference on Web and Social Media,2017: 280-289.
[32] Gilani Z,Farahbakhsh R,Tyson G,et al. An in-depth characterisation of bots and humans on twitter[J/OL]. arXiv preprint arXiv: 1704.01508,2017.
[33] Cresci S,Lillo F,Regoli D,et al. FAKE: Evidence of spam and bot activity in stock microblogs on Twitter[C]//Proceedings of the 26th international AAAI Conference on Web and Social Media,2018: 580: 583.
[34] Mazza M,Cresci S,Avvenuti M,et al. Rtbust: Exploiting temporal patterns for botnet detection on twitter[C]//Proceedings of the 10th ACM Conference on Web Science,2019: 183-192.
[35] Yang K C,Varol O,Hui P M,et al. Scalable and generalizable social bot detection through data selection[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2020: 1096-1103.
[36] Sayyadiharikandeh M,Varol O,Yang K C,et al. Detection of novel social bots by ensembles of specialized classifiers[C]//Proceedings of the 29th ACM International Conference on Information & Knowledge Management,2020: 2725-2732.
[37] Derhab A,Alawwad R,Dehwah K,et al. Tweet-based bot detection using big data analytics[J]. IEEE Access,2021,9: 65988-6005.
[38] Lundberg J,Nordqvist J,Laitinen M. Towards a language independent twitter bot detector[C]//Proceedings of the 4th Conference of The Association Digital Humanities in the Nordic Countries. Copenhagen: University of Copenhagen,2019: 308-319.
[39] 金丹,滕洁琪. 基于机器学习的微博机器用户识别研究[J]. 中国高新技术企业,2016,30: 4.
[40] Chen Z,Subramanian D. An unsupervised approach to detect spam campaigns that use botnets on twitter[J/OL]. arXiv preprint arXiv: 1804.05232,2018.
[41] Abu-El-Rub N,Mueen A. Botcamp: Bot-driven interactions in social campaigns[C]//Proceedings of the World Wide Web Conference,2019: 2529-2535.
[42] 徐帅帅,戴新宇,黄书剑,等. 基于无指导学习的微博评论分析方法[J]. 中文信息学报,2017,31(02): 179-186.
[43] Kudugunta S,Ferrara E. Deep Neural Networks for Bot Detection[J]. Information Sciences,2018,467: 312-322.
[44] 李赫元,俞晓明,刘悦,等. 中文微博客的垃圾用户检测[J]. 中文信息学报,2014,28(03): 62-67,74.
[45] Efthimion,Phillip,George,et al. Supervised Machine Learning Bot Detection Techniques to Identify Social Twitter Bots[J]. SMU Data Science Review,2018,1(2): 5.
[46] Adewole K S,Han T,Wu W,et al. Twitter spam account detection based on clustering and classification methods[J]. The Journal of Supercomputing,2020,76(7): 4802-4837.
[47] Miller Z,Dickinson B,Deitrick W,et al. Twitter spammer detection using data stream clustering[J]. Information Sciences,2014,260: 64-73.
[48] Loyola-Gonzalez O,Monroy R,Rodriguez J,et al. Contrast Pattern-Based Classification for Bot Detection on Twitter[J]. IEEE Access,2019,7: 45800-45817.
[49] Qi S,AlKulaib L,Broniatowski D A. Detecting and characterizing bot-like behavior on Twitter[C]//Proceedings of International Conference on Social Computing,Behavioral-cultural Modeling and Prediction and Behavior Representation In modeling and Simulation,2018: 228-232.
[50] 李自豪. 微博恶意用户识别方法的研究[D]. 北京: 北京交通大学硕士学位论文,2017.
[51] Gabriela T,Aldo F,Yamir M. Scaling-laws of human broadcast communication enable distinction between human,corporate and robot twitter users[J]. Plos One,2013,8(7): 65774.
[52] Minnich A,Chavoshi N,Koutra D,et al. BotWalk: Efficient adaptive exploration of twitter bot networks[C]//Proceedings of the IEEE/ACM International Conference. ACM,2017: 467-474.
[53] Minaee S,Kalchbrenner N,Cambria E,et al. Deep learning--based text classification: A comprehensive review[J]. ACM Computing Surveys,2021,54(3): 1-40.
[54] Yu Y,Si X,Hu C,et al. A review of recurrent neural networks: LSTM cells and network architectures[J]. Neural Computation,2019,31(7): 1235-70.
[55] Alzubaidi L,Zhang J,Humaidi A J,et al. Review of deep learning: Concepts,CNN architectures,challenges,applications,future directions[J]. Journal of Big Data,2021,8(1): 1-74.
[56] Cai C,Li L,Zengi D. Behavior enhanced deep bot detection in social media[C]//Proceedings of IEEE International Conference on Intelligence & Security Informatics,2017: 128-130.
[57] Wei F,Nguyen U T. Twitter bot detection using bidirectional long short-term memory neural networks and word embeddings[C]//Proceedings of First IEEE International Conference on Trust,Privacy and Security in Intelligent Systems and Applications,2019: 101-109.
[58] Frber M,Qurdina A,Ahmedi L. Identifying twitter bots using a convolutional neural network[C]//Proceedings of CLEF Working Notes,2019.
[59] Alom Z,Carminati B,Ferrari E. A deep learning model for Twitter spam detection[J]. Online Social Networks and Media,2020,18: 100079.
[60] Wu Y,Fang Y,Shang S,et al. A novel framework for detecting social bots with deep neural networks and active learning[J]. Knowledge-Based Systems,2020,211: 106525.
[61] Afzal S,Asim M,Javed A R,et al. Urldeepdetect: A deep learning approach for detecting malicious urls using semantic vector models[J]. Journal of Network and Systems Management,2021,29(3): 1-27.
[62] Feng S,Wan H,Wang N,et al. Satar: A self-supervised approach to twitter account representation learning and its application in bot detection[C]//Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021: 3808-3817.
[63] 张营营. 生成对抗网络模型综述[J]. 电子设计工程,2018,26(05): 34-37,43.
[64] Stanton G,Irissappane A A. GANs for semi-supervised opinion spam detection[J/OL]. arXiv preprint arXiv: 1903.08289,2019.
[65] Najari S,Salehi M,Farahbakhsh R. GANBOT: A GAN-based framework for social bot detection[J]. Social Network Analysis and Mining,2022,12(1): 1-11.
[66] Jiang D,Wu Z,Hsieh C Y,et al. Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models[J]. Journal of Cheminformatics,2021,13(1): 1-23.
[67] Diehl F,Brunner T,Le M,et al. Graph neural networks for modelling traffic participant interaction[C]//Proceedings of IEEE Intelligent Vehicles Symposium (IV),2019: 695-701.
[68] Schlichtkrull M,Kipf T N,Bloem P,et al. Modeling relational data with graph convolutional networks[C]//Proceedings of European semantic web conference,2018: 593-607.
[69] Kolomeets M,Chechulin A,Kotenko I. Bot detection by friends graph in social networks[J]. Journal of Wireless Mobile Networks,Ubiquitous Computing,and Dependable Applications,2021,12: 141-159.
[70] Feng S,Wan H,Wang N,et al. BotRGCN: Twitter bot detection with relational graph convolutional networks[C]//Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining,2021: 236-239.
[71] Li Y,Wang J,Li S,et al. Relevance-aware anomalous users detection in social network[C]//Proceedings of the International Joint Conference on Neural Networks,2021: 1-8.
[72] Feng S,Tan Z,Li R,et al. Heterogeneity-aware twitter bot detection with relational graph transformers[J/OL]. arXiv preprint arXiv: 2109.02927,2021.
[73] 罗云松,黄慕宇,贾韬. 重采样在微博机器人识别中的应用研究[J]. 中文信息学报,2021,35(12): 133-148.
[74] 沈瑞琳,潘伟民,张海军. 基于迁移学习的微博谣言检测方法[J]. 计算机工程与设计,2021,42(12): 3534-3539.
[75] Lumezanu C,Feamster N,Klein H. Bias: Measuring the tweeting behavior of propagandists[C]//Proceedings of the 6th International AAAI Conference on Weblogs and Social Media,2012: 210-217.

基金

国家重点研发计划;中央高校基本科研业务费专项资金(EOE48922X2)
PDF(4493 KB)

2868

Accesses

0

Citation

Detail

段落导航
相关文章

/