Social bots in microblog platforms significantly impact information dissemination and public opinion stance. This paper reviews the recent researches on social bot account detection in microblogs, especially Twitter and Weibo. The popular methods for data acquisition and feature extraction are reviewed. Various bot detection algorithms are summarized and evaluated, including approaches based on statistical methods, classical machine learning methods, and deep learning methods. Finally, some suggestions for future research are anticipated.
ZHANG Xuan, LI Baobin.
Social Bot Account Detection on Microblog: A Survey. Journal of Chinese Information Processing. 2022, 36(12): 1-15
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Mavrodieva,Rachman,Harahap,et al. Role of social media as a soft power tool in raising public awareness and engagement in addressing climate change[J]. Climate,2019,7(10): 122-122. [2] Cresci S. A decade of social bot detection[J]. Communications of the ACM,2020,63(10): 72-83. [3] Cresci S,Lillo F,Regoli D,et al. Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on twitter[J]. ACM Transactions on the Web,2018,13(2): 1-27. [4] Cresci S,Di Pietro R,Petrocchi M,et al. Social fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling[J]. IEEE Transactions on Dependable and Secure Computing,2017,15(4): 561-76. [5] Shao C,Ciampaglia G L,Varol O,et al. The spread of low-credibility content by social bots[J]. Nature Communications,2018,9(1): 1-9. [6] Stella M,Ferrara E,De Domenico M. Bots increase exposure to negative and inflammatory content in online social systems[C]//Proceedings of the National Academy of Sciences,2018: 12435-40. [7] Cheng C,Luo Y,Yu C. Dynamic mechanism of social bots interfering with public opinion in network[J]. Physica A: Statistical Mechanics and its Applications,2020,551: 124163. [8] Thomas K,Grier C,Song D,et al. Suspended accounts in retrospect: an analysis of twitter spam[C]//Proceedings of the ACM SIGCOMM Conference on Internet Measurement Conference,2011: 243-258. [9] Twitter. Twitter Announces 3rd Quarter 2021 Results[EB/OL]. https://investor.twitterinc.com[2022-03-27]. [10] Weibo. Weibo Reports Third Quarter 2021[EB/OL]. http://ir.weibo.com[2022-03-27]. [11] Schields B,Levashina J. Comparing the Social Media in the United States and BRIC Nations,and the Challenges Faced in International Selection[C]//Proceedings of Social Media in Employee Selection and Recruitment. Cham,Switzerland: Springer,2016: 157-174. [12] Weibo . 微博服务使用协议[EB/OL]. https://www.weibo.com/sinup/v5/protocol[2022-03-27]. [13] Twitter. Twitter规则[EB/OL]. https://help.twitter.com/zh-cn/rules-and-policies/twitter-rules[2022-06-10]. [14] Twitter. Automation rules[EB/OL]. https://help.twitter.com[2022-03-27]. [15] 杨慧芸. 隐形操纵与数据污染: 社交媒体中的机器人水军[J]. 新闻知识,2020,1: 3-10. [16] Cresci S,Di Pietro R,Petrocchi M,et al. Fame for sale: Efficient detection of fake Twitter followers[J]. Decision Support Systems,2015,80: 56-71. [17] Bessi A,Ferrara E. Social Bots Distort the 2016 US Presidential Election Online Discussion[J]. First Monday,2016,21: 11-7. [18] 黎明. 警方微博自查 投票悄然逆转[N]. 南方都市报,2011-03-02(DA07). [19] Powers D M W. Evaluation: From precision,recall and F-measure to ROC,informedness,markedness and correlation[J/OL]. arXiv preprint arXiv: 2010.16061,2020. [20] Ferrara E,Varol O,Davis C,et al. The rise of social bots[J]. Communications of the ACM,2016,59(7): 96-104. [21] Twitter. API[EB/OL]. https://developer.twitter.com[2022-03-28]. [22] Lee K,Eoff B,Caverlee J. Seven months with the devils: A long-term study of content polluters on twitter[C]//Proceedings of the International AAAI conference on web and social media,2011: 185-192. [23] Feng S,Wan H,Wang N,et al. Twibot-20: A comprehensive twitter bot detection benchmark[C]//Proceedings of the 30th ACM International Conference on Information & Knowledge Management,2021: 4485-4494. [24] Cresci S,Pietro R D,Petrocchi M,et al. Social fingerprinting: Detection of spambot groups through DNA-Inspired behavioral modeling[J]. IEEE Transactions on Dependable and Secure Computing,2017,15(4): 561-576. [25] Chavoshi N,Hamooni H,Mueen A. DeBot: Twitter bot detection via warped correlation[C]//Proceedings of IEEE International Conference on Data Mining,2016: 817-822. [26] Yang K C,Varol O,Davis C A,et al. Arming the public with artificial intelligence to counter social bots[J]. Human Behavior and Emerging Technologies,2019,1(1): 48-61. [27] Yang K C. Bot Repository[EB/OL]. https://botometer.osome.iu.edu/bot-repository[2022-03-28]. [28] Hu Y,Huang H,Chen A,et al. Weibo-COV: A large-scale COVID-19 social media dataset from Weibo[J/OL]. arXiv preprint arXiv: 2005.09174,2020. [29] 车尚锟. 2013-2017年双十一前后的新浪微博数据[EB/OL]. https://doi.org/10.18170/DVN/EC0G0E[2022-03-27]. [30] 数据堂. 中文社交评论类事件标注数据[EB/OL]. https://www.datatang.com/dataset/info/text/83[2022-03-28]. [31] Varol O,Ferrara E,Davis C A,et al. Online human-bot interactions: Detection,estimation,and characterization[C]//Proceedings of the International AAAI Conference on Web and Social Media,2017: 280-289. [32] Gilani Z,Farahbakhsh R,Tyson G,et al. An in-depth characterisation of bots and humans on twitter[J/OL]. arXiv preprint arXiv: 1704.01508,2017. [33] Cresci S,Lillo F,Regoli D,et al. FAKE: Evidence of spam and bot activity in stock microblogs on Twitter[C]//Proceedings of the 26th international AAAI Conference on Web and Social Media,2018: 580: 583. [34] Mazza M,Cresci S,Avvenuti M,et al. Rtbust: Exploiting temporal patterns for botnet detection on twitter[C]//Proceedings of the 10th ACM Conference on Web Science,2019: 183-192. [35] Yang K C,Varol O,Hui P M,et al. Scalable and generalizable social bot detection through data selection[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2020: 1096-1103. [36] Sayyadiharikandeh M,Varol O,Yang K C,et al. Detection of novel social bots by ensembles of specialized classifiers[C]//Proceedings of the 29th ACM International Conference on Information & Knowledge Management,2020: 2725-2732. [37] Derhab A,Alawwad R,Dehwah K,et al. Tweet-based bot detection using big data analytics[J]. IEEE Access,2021,9: 65988-6005. [38] Lundberg J,Nordqvist J,Laitinen M. Towards a language independent twitter bot detector[C]//Proceedings of the 4th Conference of The Association Digital Humanities in the Nordic Countries. Copenhagen: University of Copenhagen,2019: 308-319. [39] 金丹,滕洁琪. 基于机器学习的微博机器用户识别研究[J]. 中国高新技术企业,2016,30: 4. [40] Chen Z,Subramanian D. An unsupervised approach to detect spam campaigns that use botnets on twitter[J/OL]. arXiv preprint arXiv: 1804.05232,2018. [41] Abu-El-Rub N,Mueen A. Botcamp: Bot-driven interactions in social campaigns[C]//Proceedings of the World Wide Web Conference,2019: 2529-2535. [42] 徐帅帅,戴新宇,黄书剑,等. 基于无指导学习的微博评论分析方法[J]. 中文信息学报,2017,31(02): 179-186. [43] Kudugunta S,Ferrara E. Deep Neural Networks for Bot Detection[J]. Information Sciences,2018,467: 312-322. [44] 李赫元,俞晓明,刘悦,等. 中文微博客的垃圾用户检测[J]. 中文信息学报,2014,28(03): 62-67,74. [45] Efthimion,Phillip,George,et al. Supervised Machine Learning Bot Detection Techniques to Identify Social Twitter Bots[J]. SMU Data Science Review,2018,1(2): 5. [46] Adewole K S,Han T,Wu W,et al. Twitter spam account detection based on clustering and classification methods[J]. The Journal of Supercomputing,2020,76(7): 4802-4837. [47] Miller Z,Dickinson B,Deitrick W,et al. Twitter spammer detection using data stream clustering[J]. Information Sciences,2014,260: 64-73. [48] Loyola-Gonzalez O,Monroy R,Rodriguez J,et al. Contrast Pattern-Based Classification for Bot Detection on Twitter[J]. IEEE Access,2019,7: 45800-45817. [49] Qi S,AlKulaib L,Broniatowski D A. Detecting and characterizing bot-like behavior on Twitter[C]//Proceedings of International Conference on Social Computing,Behavioral-cultural Modeling and Prediction and Behavior Representation In modeling and Simulation,2018: 228-232. [50] 李自豪. 微博恶意用户识别方法的研究[D]. 北京: 北京交通大学硕士学位论文,2017. [51] Gabriela T,Aldo F,Yamir M. Scaling-laws of human broadcast communication enable distinction between human,corporate and robot twitter users[J]. Plos One,2013,8(7): 65774. [52] Minnich A,Chavoshi N,Koutra D,et al. BotWalk: Efficient adaptive exploration of twitter bot networks[C]//Proceedings of the IEEE/ACM International Conference. ACM,2017: 467-474. [53] Minaee S,Kalchbrenner N,Cambria E,et al. Deep learning--based text classification: A comprehensive review[J]. ACM Computing Surveys,2021,54(3): 1-40. [54] Yu Y,Si X,Hu C,et al. A review of recurrent neural networks: LSTM cells and network architectures[J]. Neural Computation,2019,31(7): 1235-70. [55] Alzubaidi L,Zhang J,Humaidi A J,et al. Review of deep learning: Concepts,CNN architectures,challenges,applications,future directions[J]. Journal of Big Data,2021,8(1): 1-74. [56] Cai C,Li L,Zengi D. Behavior enhanced deep bot detection in social media[C]//Proceedings of IEEE International Conference on Intelligence & Security Informatics,2017: 128-130. [57] Wei F,Nguyen U T. Twitter bot detection using bidirectional long short-term memory neural networks and word embeddings[C]//Proceedings of First IEEE International Conference on Trust,Privacy and Security in Intelligent Systems and Applications,2019: 101-109. [58] Frber M,Qurdina A,Ahmedi L. Identifying twitter bots using a convolutional neural network[C]//Proceedings of CLEF Working Notes,2019. [59] Alom Z,Carminati B,Ferrari E. A deep learning model for Twitter spam detection[J]. Online Social Networks and Media,2020,18: 100079. [60] Wu Y,Fang Y,Shang S,et al. A novel framework for detecting social bots with deep neural networks and active learning[J]. Knowledge-Based Systems,2020,211: 106525. [61] Afzal S,Asim M,Javed A R,et al. Urldeepdetect: A deep learning approach for detecting malicious urls using semantic vector models[J]. Journal of Network and Systems Management,2021,29(3): 1-27. [62] Feng S,Wan H,Wang N,et al. Satar: A self-supervised approach to twitter account representation learning and its application in bot detection[C]//Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021: 3808-3817. [63] 张营营. 生成对抗网络模型综述[J]. 电子设计工程,2018,26(05): 34-37,43. [64] Stanton G,Irissappane A A. GANs for semi-supervised opinion spam detection[J/OL]. arXiv preprint arXiv: 1903.08289,2019. [65] Najari S,Salehi M,Farahbakhsh R. GANBOT: A GAN-based framework for social bot detection[J]. Social Network Analysis and Mining,2022,12(1): 1-11. [66] Jiang D,Wu Z,Hsieh C Y,et al. Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models[J]. Journal of Cheminformatics,2021,13(1): 1-23. [67] Diehl F,Brunner T,Le M,et al. Graph neural networks for modelling traffic participant interaction[C]//Proceedings of IEEE Intelligent Vehicles Symposium (IV),2019: 695-701. [68] Schlichtkrull M,Kipf T N,Bloem P,et al. Modeling relational data with graph convolutional networks[C]//Proceedings of European semantic web conference,2018: 593-607. [69] Kolomeets M,Chechulin A,Kotenko I. Bot detection by friends graph in social networks[J]. Journal of Wireless Mobile Networks,Ubiquitous Computing,and Dependable Applications,2021,12: 141-159. [70] Feng S,Wan H,Wang N,et al. BotRGCN: Twitter bot detection with relational graph convolutional networks[C]//Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining,2021: 236-239. [71] Li Y,Wang J,Li S,et al. Relevance-aware anomalous users detection in social network[C]//Proceedings of the International Joint Conference on Neural Networks,2021: 1-8. [72] Feng S,Tan Z,Li R,et al. Heterogeneity-aware twitter bot detection with relational graph transformers[J/OL]. arXiv preprint arXiv: 2109.02927,2021. [73] 罗云松,黄慕宇,贾韬. 重采样在微博机器人识别中的应用研究[J]. 中文信息学报,2021,35(12): 133-148. [74] 沈瑞琳,潘伟民,张海军. 基于迁移学习的微博谣言检测方法[J]. 计算机工程与设计,2021,42(12): 3534-3539. [75] Lumezanu C,Feamster N,Klein H. Bias: Measuring the tweeting behavior of propagandists[C]//Proceedings of the 6th International AAAI Conference on Weblogs and Social Media,2012: 210-217.