LI Heyuan 1,2, YU Xiaoming 1, LIU Yue 1, CHENG Xueqi 1, CHENG Gong3
1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; 2. University of Chinese Academy of Sciences, Beijing 100190, China; 3. National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China
Abstract:Micro-blogs changes the way people obtain information. However, Micro-blogs has been infiltrated by large amount of spam, which is a challenge to normal user. In this paper, we research on spam in Chinese Micro-blogs. We study the behavior of spam user and propose 7 new features for detecting them. Then, we describe how to apply features into detecting spammer via a SVM classifier. The experiment results indicate that the accuracy and recall of the proposed method is satisfactory.
[1] 新浪科技. 新浪微博用户数超3亿 [EB/OL]. 2012-05-16. http://is.gd/Qfn4Z9. [2] Grier C,Thomas K,Paxson V,et al. @spam: The Underground on 140 Characters or Less [C]//Proceedings of the 17th ACM Conference on Computer and Communications Security (CCS 2010). New York,US,2010: 27-37. [3] Wang A. Dont follow me: Spam detection in Twitter [C]//Proceedings of the International Conference on Security and Cryptography. Athens,Greece,2011: 142-151. [4] Song J,Lee S,Kim J. Spam Filtering in Twitter Using Sender-ReceiverRelationship [M]. Berlin,German: Springer,2006: 301-317. [5] 王宇,陆余良,郭浩,等. 中文微博僵尸粉检测技术研究[C]//中国自动化学会.第三届全国社会计算会议、平行控制会议、平行管理会议论文集. 北京: 中国自动化学会,2011. [6] Benevenuto F,Magno G,Rodrigues T,et al. Detecting Spammers on Twitter[C]//Proceedings of Seventhannual Collaboration, Electronic Messaging, Anti-Abuseand Spam Conference (CEAS 2010). Redmond,US,2010. [7] 张学工. 关于统计学习理论与支持向量[J]. 自动化学报,2001,26(1): 32-41. [8] Chang C. LIBSVM—A Library for Support Vector Machines [EB/OL]. 2006-2012. http://is.gd/rocwn9. [9] Guyon I,Gunn S,Nikravesh M. Feature extraction, foundations and applications[M]. Berlin,German: Springer,2006: 188-191.