图表示学习方法在消费金融领域团伙欺诈检测中的研究

傅湘玲,闫晨巍,赵朋亚,宋美琦,仵伟强

PDF(3579 KB)
PDF(3579 KB)
中文信息学报 ›› 2022, Vol. 36 ›› Issue (9) : 120-128,138.
情感分析与社会计算

图表示学习方法在消费金融领域团伙欺诈检测中的研究

  • 傅湘玲1,2,4,闫晨巍1,2,4,赵朋亚1,2,4,宋美琦1,2,4,仵伟强3,4
作者信息 +

Graph Representation Learning Based Group Fraud Risk Detection in the Consumer Finance Domain

  • FU Xiangling1,2,4, YAN Chenwei1,2,4, ZHAO Pengya1,2,4, SONG Meiqi1,2,4, WU Weiqiang3,4
Author information +
History +

摘要

消费金融的欺诈检测是学术界和产业界的一个重要问题,现阶段比较流行的做法是利用机器学习方法通过提取用户的固有特征来实现。随着团伙化欺诈的出现,传统的机器学习方法在欺诈用户样本数量小及特征数据不足的情况下,显得无能为力。团伙欺诈用户之间有很强的关联关系,该文利用用户间的通话数据构建用户关联网络,通过网络统计指标和DeepWalk算法提取用户节点的图特征,充分利用图的拓扑结构信息和邻居节点信息,将其与用户固有特征一起作为特征输入,使用LightGBM模型对上述多种特征进行学习。实验结果表明,采用图表示学习方法后,AUC指标与仅使用用户固有特征相比提高了7.3%。

Abstract

Fraud detection in consumer finance is an important issue in both academic and industrial community. With the emergence of group fraud, classical machine learning methods doesn’t work well due to the small number of fraudulent users and insufficient feature data. Since group fraudulent users are closely related, this paper investigates to construct a user-related network by the phone call data between users. The user feature in the graph is extracted through network statistical indicators and Deepwalk algorithm, making full use of the topological structure information and the neighboring information. The above information, together with the user’s inherent characteristics, are input to the LightGBM model. The experimental results show that with the graph representation learning method, the AUC is improved by 7.3% compared with using only inherent features.

关键词

欺诈检测 / 团伙欺诈 / 关联网络 / 图表示学习

Key words

fraud detection / group fraud / related network / graph representation learning

引用本文

导出引用
傅湘玲,闫晨巍,赵朋亚,宋美琦,仵伟强. 图表示学习方法在消费金融领域团伙欺诈检测中的研究. 中文信息学报. 2022, 36(9): 120-128,138
FU Xiangling, YAN Chenwei, ZHAO Pengya, SONG Meiqi, WU Weiqiang. Graph Representation Learning Based Group Fraud Risk Detection in the Consumer Finance Domain. Journal of Chinese Information Processing. 2022, 36(9): 120-128,138

参考文献

[1] 京东金融研究院. 数字金融反欺诈白皮书. [EB/OL]. https://download.csdn.net/download/weixin_38744207/11708011?utm_source=bbsseo[2021-03-15].
[2] 中国信通院. 移动数字金融与电子商务反欺诈.[EB/OL].https://download.csdn.net/download/u013883025/20058154[2021-03-15].
[3] Bryan P, Rami Al-r, Steven S. DeepWalk: Online learning of social representations [C] //Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, USA, 2014: 701-710.
[4] Sanchez D, Vila M A, Cerda L, et al. Association rules applied to credit card fraud detection[J]. Expert Systems with Applications, 2009, 36(2-2): 3630-3640.
[5] Gga C , Lgfc E , Cm C , et al. Managing a pool of rules for credit card fraud detection by a game theory based approach: scienceDirect[J]. Future Generation Computer Systems, 2020, 102:549-561.
[6] 仵伟强, 后其林. 基于机器学习模型的消费金融反欺诈模型与方法[J]. 现代管理科学, 2018(10):51-54.
[7] Mareeswari V, Gunasekaran G. Prevention of credit card fraud detection based on HSVM [C] //Proceedings of the International Conference on Information Communication and Embedded Systems. IEEE, 2016: 1-4.
[8] Naoufal R, Nourddine E. Enhanced credit card fraud detection based on SVM-recursive feature elimination and hyper-parameters optimization[J]. Journal of Information Security and Applications, 2020,55(3): 102596.
[9] Bhattacharyya S, Jha S, Tharakunnel K, et al. Data mining for credit card fraud: a comparative study[J]. Decision Support Systems, 2011, 50(3):602-613.
[10] Khare N, Viswanathan P.Decision tree-based fraud detection mechanism by analyzing uncertain data in banking system[G]. Emerging Research in Data Engineering Systems and Computer Communications. Advances in Intelligent Systems and Computing, 2020(1054). Springer, Singapore.
[11] Malini N, Pushpa M. Analysis on credit card fraud identification techniques based on KNN and outlier detection [C] //Proceedings of the 3rd International Conference on Advances in Electrical. IEEE, 2017: 255-258.
[12] Olszewski D. Fraud detection using self-organizing map visualizing the user profiles[J]. Knowledge-Based Systems, 2014, 70(C): 324 -334.
[13] 郝光昊.数字化欺诈与金融科技反欺诈的应用[J].税务与经济, 2019(06):40-47.
[14] Peng L, Lin R. Fraud phone calls analysis based on label propagation community detection algorithm [C] //Proceedings of the 2018 IEEE World Congress on Services. IEEE, 2018: 23-24.
[15] 赵朋亚,傅湘玲,仵伟强,等.基于标签传播的协同分类欺诈检测方法[J].深圳大学学报(理工版), 2020, 37(5): 482-489.
[16] 郭琦,李旭伟.基于特征和关联关系的社交平台欺诈检测[J].四川大学学报(自然科学版),2020,57:483-487.
[17] 张宝明,魏程益. 基于Structure2vec算法的网络欺诈风险特征选择与评估[J]. 软件导刊, 2019, 18(2): 28-33.
[18] Grover A, Leskovec J.Node2vec: Scalable feature learning for networks [C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016: 855-864.
[19] Jian T, Meng Q, Wang M, et al. LINE: Large-scale information network embedding [C] //Proceedings of the International World Wide Web Conferences Steering Committee, 2015: 1067-1077.
[20] Wang D X, Cui P, Zhu W W. Structural deep network embedding [C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016: 1225-1234.
[21] Chen L, Liu Z Q, LIU B, et al. Who stole the postage? fraud detectionin return-freight insurance claims [C] //Proceedings of the 24th ACM Sigkdd Conference on Knowledge Discovery and Data Mining: Data Science In Fintech. London, 2018.
[22] Liu, Z, Chen, C, Li, L, et al. GeniePath: Graph neural networks with adaptive receptive paths [C/OL]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(01): 4424-4431.
[23] Haochen C, Bryan P, Rami Alrf,et al. A tutorial on network embeddings.[J]. arXiv preprint arXiv:1808.02590, 2018.
[24] Breiman, L. Random forests[J]. Machine Learning, 2001(45): 5-32.
[25] Tianqi Ch, Carlos G. XGBoost: A scalable tree boosting system. [C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, USA, 2016: 785-794.
[26] Guolin K, Qi M, Thomas F, et al. LightGBM: A highly efficient gradient boosting decisiontree[J]. Advances in Neural Information Processing Systems 30. December 2017: 3149-3157.
[27] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016: 98-101.
[28] Menze B H, Kelm B M, Masuch R, et al. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data[J]. BMC Bioinformatics. 2009,10(1): 213.
[29] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks [C]//Proceedings of the 5th International Conference on Learning Representations, 2017: 1-14.

基金

国家自然科学基金(72274022)
PDF(3579 KB)

2157

Accesses

0

Citation

Detail

段落导航
相关文章

/