社交网络中往往同时存在多种类型的账号,如正常个体用户、水军、僵尸粉、蓝V组织等。我们把其行为呈现为组织特性的个体账号,定义为隐式组织。隐式组织通常背后有相应的组织团队负责账号的运营,因此其行为模式呈现为组织的行为模式,有别于个体账号。隐式组织的有效发现对于社交网络中舆情传播趋势分析、广告推荐等都有重要的意义。该文以新浪微博数据为例,在数据采集系统基础上,共人工标注了583个账号,提取了22个特征,使用朴素贝叶斯和决策树算法,实现了对隐式组织的有效识别,其准确率达86.4%,并分析得出了特征的重要程度排序。实验证明了社交网络中存在隐式组织,其行为特征是可以识别的。
Abstract
Various types of account tend to be existed in Social network, including normal individual users, online water army, zombie fans, official organizations and so on. We define the individual accounts whose behavior is rendered as organizational characteristic as impli-cit organization. With a team responsible for the operations, the impli-cit organization account bears no individuals' behavior pattern, but falls in the pattern of an official organization. The effective discovery of implicit organizations have important significance for analysis of public opinion trends in the spread of social networks, advertising recommendations and so on. This paper, taking the data of SinaWeibo as an example, investigates the classification of the individuals and the implicit organizations. We manually labeled a total of 583 accounts, and summarizing 22 related features to build a Naive Bayes model and a decision tree model. Experiments demonstrate an effective identification of implicit organization by 86.4% precision.
关键词
社交网络 /
隐式组织 /
机器学习算法
{{custom_keyword}} /
Key words
social network /
implicit organization /
machine learning algorithm
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 许永峰, 张书玲.带组织的粒子群优化算法: OPSO[J].计算机应用与软件, 2008: 25(2): 234-236.
[2] 陈世明.基于局部信息的若干群体行为研究[D].华中科技大学博士学位论文, 2006.
[3] 于显洋.组织社会学[M].北京: 中国人民大学出版社,2004: 162-172.
[4] 莫衡. 当代汉语词典[M]. 上海: 上海辞书出版社, 2001: 1-1605.
[5] 杨学为. 中国考试大辞典[M]. 上海: 上海辞书出版社, 2006: 1-506.
[6] Chen C, Wu K, Srinivasan V, et al. Battling the internet water army: Detection of hidden paid posters[C]//Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. ACM, 2013: 116-120.
[7] Liu L, Jia K. Detecting spam in chinese microblogs-a study on sina weibo[C]//Proceedings of Computational Intelligence and Security (CIS), 2012 Eighth International Conference on IEEE, 2012: 578-581.
[8] Benevenuto F, Magno G, Rodrigues T, et al. Detecting spammers on twitter[C]//Proceedings of Collaboration, electronic messaging, anti-abuse and spam conference (CEAS). 2010, 6: 12.
[9] McCord M, Chuah M. Spam detection on twitter using traditional classifiers[M].Autonomic and Trusted Computing. Springer Berlin Heidelberg, 2011: 175-186.
[10] Gianvecchio S, Xie M, Wu Z, et al. Humans and bots in internet chat: measurement, analysis, and automated classification[J]. IEEE/ACM Transactions on Networking (TON), 2011, 19(5): 1557-1571.[11] Veloso A, Meira W. Lazy associative classification for content-based spam detection[C]//Proceedings of Web Congress, 2006. LA-Web'06. Fourth Latin American. IEEE, 2006: 154-161.
[12] Wang A H. Don't follow me: Spam detection in twitter[C]//Proceedings of the 2010 International Conference on IEEE, 2010: 1-10.
[13] de Lima B V A, Machado V P. Machine learning algorithms applied in automatic classification of social network users[C]//Proceedings of CASoN. 2012: 58-62.
[14] Stringhini G, Kruegel C, Vigna G. Detecting spammers on social networks[C]//Proceedings of the 26th Annual Computer Security Applications Conference. ACM, 2010: 1-9.
[15] Costa H, Benevenuto F, Merschmann L H C. Detecting tip spam in location-based social networks[C]//Proceedings of the 28th Annual ACM Symposium on Applied Computing. ACM, 2013: 724-729.
[16] 王越, 张剑金, 刘芳芳. 一种多特征微博僵尸粉检测方法与实现[J]. 中国科技论文, 2014, 9(1): 81-86.
[17] 刁翠霞, 陈思凤, 刘业政. 基于SVM 求解不均衡数据集分类的主观权重约束方法[J]. 管理工程学报, 2012, 26(3): 146-150.
[18] 安金龙. 支持向量机若干问题的研究[D].天津大学博士学位论文, 2004.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
中国科学院院战略先导专项(XDA06030200);国家科技支撑计划(2012BAH46B03);国家自然科学基金(61272427)
{{custom_fund}}