正确辨识网络账号的马甲关系,能够维护网络环境的安全与和谐,抑制网络中不法行为和虚假信息。基于文本挖掘的作者身份识别一直受到广泛关注,但对社交网络中文本作者关系鉴别的研究较少,该文提出了一种社交网络账号的马甲识别方法,基于网络语言的风格和账号关系,分别提取网络文本特征和账号之间的回复关系频次两组特征构成特征集合,同时基于账号组合构建训练样本向量空间,鉴别网络账号的马甲关系。结合论坛数据对所提方法进行了实验验证,准确率达到80%,结果表明该方法具有较高的马甲辨别准确率。
Abstract
Real name registration suffers great difficulties in social network and it is a world-wide issue. Some users use multiple IDs (usually called “sock-puppet”) to publish disharmonious views in order to reach illegal attempt such as to start or spread a rumor. Its important to figure out a way to identify these users. In this paper, we propose to extract featuresfrom text data and social relation data, and train a novel vector-space-model based on the combination of different IDs to detectthe sock-puppet relation. In the experiment of the forum data, we achieved 93% of classify precision. The result verified the effectiveness of the proposed method.
关键词
马甲识别 /
语言风格 /
关系特征 /
社交网络
{{custom_keyword}} /
Key words
sock-puppet identify /
writing style /
relation feature /
social network
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Nirkhi S, Dharaskar R V. Comparative study of Authorship Identification Techniques for Cyber Forensics Analysis[J]. International Journal of Advanced Computer Science and Applications, 2013,4(5): 32-35.
[2] Zheng R, Li J, Chen H, et al. A framework for authorship identification of online messages: Writing‐style features and classification techniques[J]. Journal of the American Society for Information Science and Technology, 2006, 57(3): 378-393.
[3] 王少康, 董科军, 阎保平. 基于语句节奏特征的作者身份识别研究[J]. Computer Engineering, 2011, 37(9):4-5.
[4] 孙晓明, 马少平. 基于写作风格的作者识别[C].中国中文信息学会第五届全国会员代表大会暨成立二十周年学术会议论文集. 北京: 清华大学出版社.2001.
[5] 金明哲. 中文文章的作者识别[R]. 第二届中国社会语言学国际学术研讨会暨中国社会语言学会成立大会, 2003.
[6] 武晓春, 黄萱菁, 吴立德. 基于语义分析的作者身份识别方法研究[J]. 中文信息学报, 2006, 20(6): 61-68.
[7] De Vel O, Anderson A, Corney M, et al. Mining e-mail content for author identification forensics [J]. ACM Sigmod Record, 2001, 30(4): 55-64.
[8] Abbasi A, Chen H. Applying authorship analysis to extremist-group web forum messages [J]. Intelligent Systems, IEEE, 2005, 20(5): 67-75.
[9] Yu B. An evaluation of text classification methods for literary study [J]. Literary and Linguistic Computing, 2008, 23(3): 327-343.
[10] Diederich J, Kindermann J, Leopold E, et al. Authorship attribution with support vector machines[J]. Applied intelligence, 2003, 19(1-2): 109-123.
[11] Ge R, Ester M, Gao B J, et al. Joint cluster analysis of attribute data and relationship data: The connected k-center problem, algorithms and applications [J]. ACM Transactions on Knowledge Discovery from Data (TKDD), 2008, 2(2): 7.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家重点基础研究发展计划(973计划)(2012CB316303,2013CB329602);国家自然科学基金(61232010);国家自然科学基金(61173064);国家科技支撑计划(2012BAH39B04)
{{custom_fund}}