本文提出了一种基于分解转移矩阵的PageRank的迭代计算方法。该方法对PageRank理论模型进一步推导,把其Markov状态转移矩阵进行了分解,从而降低存储开销和计算复杂度,减少I/O需求,使得PageRank计算的工程化实现更为简单。实验表明1 700多万的网页2.8亿条链接,可以在30秒内完成一次迭代,内存需求峰值585MB,可以满足工程化应用的需求。
Abstract
This paper proposes a method of computing PageRank based on transfer matrix decomposition. Based on the PageRank random surfer model, the method decomposes the Markov states transfer matrix, so that the memory cost, computational complexity and I/O needs are reduced drastically. Experiments show that each iteration can be completed in 30 seconds and that the peak memory demand is 585MB during the PageRank computation of 17 million Web Pages containing 280 million links, indicating that this method meets the demand for engineering applications.
关键词
计算机应用 /
中文信息处理 /
PageRank /
搜索引擎 /
Markov状态转移矩阵 /
矩阵分解
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
PageRank /
search engine /
Markov state transition matrix /
matrix decomposition
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Sergey Brin, Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine[A]. In: Proceedings of the 7th International WWW Conference [C]. Brisbane, Australia: 1998.107-117.
[2] Taher H. Haveliwala. Efficient Computation of PageRank[R].Stanford University Technical Report. 1999.
[3] Taher Haveliwala, Sepandar Kamvar et al. Extrapolation methods for accelerating PageRank computations[A]. In: Proceedings of the 12th International WWW Conference [C].2003.261-270.
[4] Taher Haveliwala, Sepandar Kamvar, Dan Klein et al. Computing PageRank using Power Extrapolation[R].Technical Report, Stanford University, 2003.
[5] Sepandar Kamvar, Taher Haveliwala, Chris Manning, et al. Exploiting the Block Structure of the Web for Computing Pagerank[R]. Stanford University, 2003.
[6] Konstantin Avrachenkov and Nelly Litvak. Decomposition of the Google PageRank and optimal. linking strategy[R]. Technical Report, INRIA, January, 2004.
[7] Lawrence Page, Sergey Brin, Rajeev Motwani, et.al. The PageRank Citation Ranking: Bringing Order to the Web[R]. Stanford Digital Libraries Working Paper, 1998.
[8] Boldi P,Vigna S.The Webgraph framework I: Compression techniques[A]. In: Proceedings of the 13th World Wide Web Conference[C]. New York: ACM Press, 2004. 595-601.
[9] IBM Almaden Research Center, CLEVER Searching [DB/OL]. http://www.almaden.ibm.com.
[10] George Kingsley Zipf. Selective studies and the principle of relative frequency in language[M]. Massachusetts: Harvard University Press, Cambridge, 1931.
[11] D. Donato, L. Laura, S. Leonardi, et al. Large scale properties of the webgraph[J]. European Physical Journal B, 2004, 2:239-243.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
863计划重点项目资助(2006AA010105);北京市教委科技发展计划项目资助(KM200710772010);北京市属市管高校人才强教计划项目资助(PXM2007_014224_044677,PXM2007_014224_044676)
{{custom_fund}}