异质信息网络中基于元路径的社团发现算法研究

郑玉艳,王明省,石川,王锐

PDF(2945 KB)
PDF(2945 KB)
中文信息学报 ›› 2018, Vol. 32 ›› Issue (9) : 132-142.
情感分析与社会计算

异质信息网络中基于元路径的社团发现算法研究

  • 郑玉艳1,王明省2,石川1,王锐1
作者信息 +

Research on Community Detection Algorithm Based on Meta Path in Heterogeneous Information Network

  • ZHENG Yuyan1, WANG Mingsheng2, SHI Chuan1, WANG Rui1
Author information +
History +

摘要

实际的网络化数据往往包含多种类型的对象和关系,采用异质信息网络可以更好地对其建模,因此异质信息网络分析逐渐成为数据挖掘的研究热点。虽然同质信息网络中的社团发现已经被深入研究,但是异质信息网络中的社团发现还很少被研究。该文研究异质信息网络中的社团发现问题,提出了一个新的社团发现算法框架HCD(heterogeneous community detection)。该框架由两部分组成: 基于单条元路径的社团发现算法HCD_sgl和融合多条元路径的社团发现算法HCD_all。HCD_sgl首先确定在给定元路径下所有节点的初始标签,再利用改进的标签传递算法进行最终的社团发现;HCD_all是在HCD_sgl的基础上将基于多条元路径的社团发现结果进行融合。通过在真实数据集和人工数据集上的实验验证了HCD算法的有效性。

Abstract

The real networked data often contain different types of objects and relations,which can be better modeled with heterogeneous information network. Although the community detection in homogeneous information networks has been intensively studied,few works are done in heterogeneous information networks.In this paper,we study the community detection problem in heterogeneous information networks,and propose a novel method based on meta path called HCD (heterogeneous community detection). This method consists of two parts: a HCD_sgl algorithm based on single meta path,and a HCD_all algorithm combining multiple meta paths. The HCD_sgl decides the initial community label,then detecting the final community structures through the improved label propagation algorithm. HCD_all combined the results of multipie meta paths.Experiments on real dataset and artificial dataset demonstrate that the proposed method can detect community structures in heterogeneous information networks effectively.

关键词

异质信息网络 / 社团发现 / 元路径 / 语义相似性度量

Key words

heterogeneous information network / community detection / meta path / semantic similarity measure

引用本文

导出引用
郑玉艳,王明省,石川,王锐. 异质信息网络中基于元路径的社团发现算法研究. 中文信息学报. 2018, 32(9): 132-142
ZHENG Yuyan, WANG Mingsheng, SHI Chuan, WANG Rui. Research on Community Detection Algorithm Based on Meta Path in Heterogeneous Information Network. Journal of Chinese Information Processing. 2018, 32(9): 132-142

参考文献

[1] Sun Y,Han J,Yan X,et al. Pathsim:Meta path-based top-k similarity search in heterogeneous information networks [J].Proceedings of the VLDB Endowment,2011,4(11):992-1003.
[2] Shi C,Li Y,Zhang J,et al. A survey of heterogeneous information network analysis[J].IEEE Transactions on Knowledge and Data Engineering,2017,29(1):17-37.
[3] Infonet[OL].http://www.cs.uiuc.edu/homes/hanj/projs/infonet. html.
[4] Han J. Mining heterogeneous information networks by exploring the power of links[C]//Proceedings of International Conference on Discovery Science. Springer Berlin Heidelberg,2009:13-30.
[5] Sun Y,Han J. Mining heterogeneous information networks:Principles and methodologies [J].Synthesis Lectures on Data Mining and Knowledge Discovery,2012,3(2):1-159.
[6] Wang C,Raina R,Fong D,et al. Learning relevance from heterogeneous social network and its application in online targeting[C]//Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM,2011:655-664.
[7] Bedi P,Sharma C. Community detection in social networks [J].Wiley Interdisciplinary Reviews:Data Mining and Knowledge Discovery,2016,6(3):115-135.
[8] Raghavan U N,Albert R,Kumara S. Near linear time algorithm to detect community structures in large-scale networks[J].Physical Review E,2007,76(3):036106.
[9] Xie J,Szymanski B K. Towards linear time overlapping community detection in social networks[C]//Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer Berlin Heidelberg,2012:25-36.
[10] Hu W. Finding statistically significant communities in networks with weighted label propagation[J].Social Networking,2013,02(3):138-146.
[11] Gregory S. Finding overlapping communities in networks by label propagation[J].New Journal of Physics,2009,12(10):2011-2024.
[12] Newman M E J,Girvan M. Finding and evaluating community structure in networks[J].Physical Review E,2004,69(2):026113.
[13] Kernighan B W,Lin S. An efficient heuristic procedure for partitioning graphs[J].Bell System Technical Journal,1970,49(2):291-307.
[14] Newman M E J. Community detection and graph partitioning[J].Europhysics Letters,2013,103(2):330-337.
[15] Newman M E J. Fast algorithm for detecting community structure in networks[J].Physical Review E,2004,69(6):066133.
[16] Girvan M,Newman M E J. Community structure in social and biological networks [J].Proceedings of the National Academy of Sciences,2002,99(12):7821-7826.
[17] Blondel V D,Guillaume J L,Lambiotte R,et al. Fast unfolding of communities in large networks [J].Journal of Statistical Mechanics:Theory and Experiment,2008(10):P10008.
[18] Pizzuti C. Ga-net:A genetic algorithm for community detection in social networks[C]//Proceedings of International Conference on Parallel Problem Solving from Nature. Springer Berlin Heidelberg,2008:1081-1090.
[19] Liu X,Li D,Wang S,et al. Effective algorithm for detecting community structure in complex networks based on GA and clustering[C]//Proceedings of International Conference on Computational Science. Springer Berlin Heidelberg,2007:657-664.
[20] Tasgin M,Herdagdelen A,Bingol H.Community detection in complex networks using genetic algorithms[J].arXiv preprint arXiv:0711.0491,2007.
[21] Sun Y,Norick B,Han J,et al.PathSelClus:Integrating meta-path selection with user-guided object clustering in heterogeneous information networks [J].ACM Transactions on Knowledge Discovery from Data,2012,7(3):723-724.
[22] Luo C,Pang W,Wang Z.Semi-supervised clustering on heterogeneous information networks[C]//Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer International Publishing,2014:548-559.
[23] Lao N,Cohen W W. Relational retrieval using a combination of path-constrained random walks [J].Machine Learning,2010,81(1):53-67.
[24] Ji M,Sun Y,Danilevsky M,et al. Graph regularized transductive classification on heterogeneous information networks[C]//Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg,2010:570-586.
[25] 李依桐,黄岳,石川等. 基于矩阵分解的异质信息网络聚类分析研究[J].小型微型计算机系统,2014,35(10):2256-2261.
[26] Sun Y,Han J,Zhao P,et al.Rankclus:Integrating clustering with ranking for heterogeneous information network analysis[C]//Proceedings of the 12th International Conference on Extending Database Technology:Advances in Database Technology. ACM,2009:565-576.
[27] Newman M E J. Analysis of weighted networks[J].Physical Review E,2004,70(5):056131.
[28] Fiedler M. Algebraic connectivity of graphs [J].Czechoslovak Mathematical Journal,1973,23(2):298-305.
[29] Shi C,Kong X,Yu P S,et al. Relevance search in heterogeneous networks[C]//Proceedings of the 15th International Conference on Extending Database Technology. ACM,2012:180-191.
[30] Shi C,Kong X,Huang Y,et al. Hetesim:A general framework for relevance measure in heterogeneous networks[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(10):2479-2492.

基金

国家重点基础研究发展计划(2017YFB0803304);国家自然科学基金(61772082,61375058);北京市自然科学基金(4182043)
PDF(2945 KB)

Accesses

Citation

Detail

段落导航
相关文章

/