基于幂律分布的网络用户快速排序算法

张 玥,张宏莉,张伟哲

PDF(3103 KB)
PDF(3103 KB)
中文信息学报 ›› 2012, Vol. 26 ›› Issue (4) : 122-129.
综述

基于幂律分布的网络用户快速排序算法

  • 张 玥,张宏莉,张伟哲
作者信息 +

Quick Ranking Algorithm for Network User Based on Power Law Distribution

  • ZHANG Yue, ZHANG Hongli, ZHANG Weizhe
Author information +
History +

摘要

随着网络论坛、博客、微博的发展,引出社会网络中的用户排序问题。将在线网络论坛中用户映射为节点,用户评论过程中形成的回复关系映射为有向关联图,其节点度符合幂律分布。且论坛中用户的主题发布行为和回复关系符合Pagerank算法的互增强和随机游走特性,因此选用Pagerank算法排序用户影响力。该文提出的研究问题 如何提高用户排序应用中数据的存储和运行效率。天涯网络论坛中80%以上用户入度为0,据此,根据入度是否为0划分为两个集合,对入度为0集合按出度构造链接表,设计了基于集合划分的高效排序算法SD-Rank。SD-Rank时空复杂性为O(V′),V′为入度非0节点集。对天涯网络论坛真实用户数据的实验结果表明 SD-Rank算法时空复杂性优于Pagerank算法。

Abstract

With the wide application of BBS, blog and micro blog etc, how to rank the user becomes a well-recognized research issue,especially in the social network area. The paper analyzes the users relational graph in network BBS, representing the correlation graph by users and replies between users, and reveals its power law distribution in in/out degree. Owing the fact that the users behavior of posting and replying is in accordance with Pageranks characteristics of random walking and mutual-enforcement, the users influence is ranked with Pagerank algorithm. This paper further addresses the issue of time-space ratio in computation resulted by the exponent growth of users number. Based on the observation that over 80 percent users indegree is 0. Using list structure designs the efficient set-division-ranking arithmetic (SD-Rank) is designed by via list structure after dividing users into two sets0-indegree in set0 and non-0-indegree in set1. Through set partitions according to degree distribution, the time-space complexity of SD-Rank is decreased from O(V+E) to O(V′), in which V′ is the size of set1. Experiment on TIANYA BBS dataset shows that SD-Rank is more efficient than Pagerank.
Key wordspower law; in degree; set division; quick rank

关键词

幂律 / 入度 / 集合划分 / 快速排序

Key words

power law / in degree / set division / quick rank

引用本文

导出引用
张 玥,张宏莉,张伟哲. 基于幂律分布的网络用户快速排序算法. 中文信息学报. 2012, 26(4): 122-129
ZHANG Yue, ZHANG Hongli, ZHANG Weizhe. Quick Ranking Algorithm for Network User Based on Power Law Distribution. Journal of Chinese Information Processing. 2012, 26(4): 122-129

参考文献

[1] Agarwal N., Liu H., Tang L., et al. Identifying the influential bloggers in a community[C]//Proceedings of the international conference on Web search and web datamining(ICWSM 08),New York, US, ACM, 2008:207-218.
[2] Paul N. Bennett, Krysta Svore, Susan T. Dumais.Classification-enhanced Ranking[C]//Proceedings of the 19th international conference on World Wide Web, Raleigh, NC, USA, 2010:111-120.
[3] T. L. Fond, J. Neville. Randomization Tests for Distinguishing Social Influence and Homophily Effects[C]//Proceedings of the 19th international conference on World Wide Web, Raleigh, NC,USA, 2010:601-610.
[4] H. Kwak, C. Lee, H. Park, et al. What is Twitter, a Social Network or a News Media[C]//Proceedings of the 19th international conference on World Wide Web, Raleigh, NC, USA, 2010:591-600.
[5] S. Wu, J. M. Hofman, W. A. Mason, et al. Who says What to Whom on Twitter[C]//Proceedings of the 20th international conference on World Wide Web, Madrid, India, 2011:705-714.
[6] M. Cha, H. Haddadi, F. Benevenuto, e al. Measuring user influence on wtitter: The million follower fallacy[C]//Proceedings of the 4th International AAAI Conference on Weblogs and Social Media, Washington DC, 2010.
[7] J. Weng, E. P. Lim, J. Jiang, et al. Twitterrank: finding topic-sensitive influential twitterers[C]//Proceedings of the 4th ACM international conference on Web search and data mining(WSDM), 2010:261-270.
[8] Dan Cosley, D. Huttenlocher,Jon Kleinberg,et al. Sequential Influence Models in Social Networks[C]//Proceedings of the International Conference on Weblogs and Social Media(ICWSM), 2010.
[9] A. Anagnostopoulos, R.Kumar, M. Mahdian. Influence and correlation in social networks[C]//Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining,2008:7-15.
[10] P. Singla, M. Richardson. Yes, there is a correlation: from social networks to personal behavior on the web[C]//Proceedings of the 17th international conference on World Wide Web, 2008:655-664.
[11] Jie Tang, Jimeng Sun, Chi Wang, et al. Social Influence Analysis in Large-scale Networks[C]//Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 2009:721-730.
[12] Aditya Pai, Scott Counts. Identifying Topical Authorities in Microblogs[C]//Proceedings of the fourth ACM international conference on Web search and data mining(WSDM ), 2011:45-54.
[13] J. Leskovec, E. Horvitz. Planetary-scale views on a large instant-mssaging network[C]//Proceedings of the 17th international conference on World Wide Web. Beijing, China, 2008:915-924.
[14] D. Kemp, J.Kleinberg, E. Tardos. Maximizing the spread of influence through a social network[C]//Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, 2003:137-146.
[15] S Brin. L Page, R Motwani, T Winograd. The pagerank citation ranking: Bringing order to the web[R]. Technical report, Stanford University, 1999.
[16] J Kleinberg. Authoritative sources in a hyperlinked environment[J]. Journal of the ACM, 1999,46(5):604-632
[17] Frank Mcsherry. A uniform Approach to Accelerated Pagerank Computation[C]//Proceedings of the 14th international conference on WWW, 2005:575-582.
[18] Sepandar D. Kamvar, T H. Haveliwala, C D. Manning, et al. Extrapolation Methods for Accelerating Pagerank[C]//Proceedings of the 12th international conference on WWW, 2003:261-270.
[19] Stoer Josef, R. Bulirsch. Introduction to Numerical Analysis[M]. Berlin, Dover Publications. 2002: 619
[20] Thomas A. Smith. The web of law[J]. San Diego Law Review,2007,44(309).
[21] A. Broder, R. Kumar, F. Maghoul, et al. Graph structure in the Web[C]//Proceedings of the 9th International World Wide Web Conference, 2000:309-320.
[22] David Easley, Jon Kleinberg. Networks, Crowds, and Markets: Reasoning about a Highly Connected World[M]. Cambridge University Press, 2010. P546
[23] R. Fagin, R.Kumar, D. Sivakumar. Comparing top k lists[C]//Proceedings of the 14th annual ACM-SIAM symposium on discrete algorithms,2003:28-36.
[24] F. McCown, M. L. Nelson. Agreeing to disagree: search engines and their public interfaces[C]//Proceedings of the 7th ACM/IEEE-CS joint conference on digital libraries. ACM,2007:309-318.
[25] Thomas H. Cormen, Charles E.Leiserson, Ronald L. Rivest, et al. Introduction to Algorithms[M]. The MIT Press.2001. P527
[26] N. Agrawal, H. Liu, L. Tang, et al. Identifying the Influential Bloggers in a Community[C]//Proceedings of the international conference on Web search and web data mining(WSDM08), 2008:207-218.
     ()()

基金

国家863自然科学基金(2010AA012504);国家973重点基础研究发展规划项目基金(G2011CB302605);国家自然科学基金(61173145)
PDF(3103 KB)

467

Accesses

0

Citation

Detail

段落导航
相关文章

/