一种抵抗链接作弊的PageRank改进算法

贺志明1,王丽宏2,张 刚1,程学旗1

PDF(2242 KB)
PDF(2242 KB)
中文信息学报 ›› 2012, Vol. 26 ›› Issue (5) : 101-107.
综述

一种抵抗链接作弊的PageRank改进算法

  • 贺志明1,王丽宏2,张 刚1,程学旗1
作者信息 +

An Improved Pagerank Algorithm with Anti-Link Spam

  • HE Zhiming1, WANG Lihong2, ZHANG Gang1, CHENG Xueqi1
Author information +
History +

摘要

大量的基于链接的搜索引擎作弊方法对传统PageRank算法造成了巨大的影响,例如,链接农场、交换链接、黄金链、财富链等使得网页的PageRank值失去了公正性和权威性。该文在分析多种作弊方法对传统PageRank算法所造成的不利影响的基础上,提出了一种可以抵抗链接作弊的三阶段PageRank算法-TSPageRank算法,该文对TSPageRank算法的原理进行了详细分析,并通过实验证明TSPageRank算法比传统的PageRank算法在效果上提高了59.4%,能够有效地提升重要网页的PageRank值,并降低作弊网页的PageRank值。

Abstract

A large number of link-based spams caused a huge impact on traditional PageRank algorithm, such as link farm, link exchange, golden links and so on. This paper proposes a new PageRank algorithm named Three Stages PageRank algorithm(TSPageRank) which can resist link spam to a certain extent. Through experiments, we found out that TSPageRank algorithm increased 59.4% on the result of PageRank. TSPageRank can increase the PR of useful and authority pages and decrease the PR of spam and rubbish pages.
Key wordssearch engine spam; PageRank algorithm; link farm

关键词

搜索引擎作弊 / PageRank算法 / 链接农场

Key words

search engine spam / PageRank algorithm / link farm

引用本文

导出引用
贺志明1,王丽宏2,张 刚1,程学旗1. 一种抵抗链接作弊的PageRank改进算法. 中文信息学报. 2012, 26(5): 101-107
HE Zhiming1, WANG Lihong2, ZHANG Gang1, CHENG Xueqi1. An Improved Pagerank Algorithm with Anti-Link Spam. Journal of Chinese Information Processing. 2012, 26(5): 101-107

参考文献

[1] 第28次中国互联网络发展状况统计报告[R]. 中国互联网络信息中心,2011年7月.
[2] S. Brin, L. Page. The anatomy of a large-scale hypertextual Web search engine[J].Computer Networks and ISDN Systems, 1998, 30: 107-117.
[3] B. Wu.Finding and Fighting Search Engine Spam[D].PhD thesis, Department of Computer Science and Engineering, Lehigh University, 2007.
[4] Baoning Wu.Finding and Fighting Search Engine Spare[D].Lehigh Univ.2007.
[5] Gyngyi Z., Garcia-Molina H.Web spam taxonomy[C]//Proceedings of First International Workshop on Adversarial Information Retrieval on the Web, 2005: 39-47.
[6] Zoltan Gyongyi, Pavel Berkhin, Hector Garcia-Molina, et al. Link Spam Detection Based on Mass Estimation[C]//Proceedings of Technical Report. 2006.
[7] Baoning Wu, Brian D. Davison. Identifying link farm spam pages[C]//Proceedings of Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, Japan, Chiba, May, 2005, 10-14.
[8] B. Wu. Finding and Fighting Search Engine Spam[D].PhD thesis, Department of Computer Science and Engineering, Lehigh University, 2007.
[9] Y. Wang, Z. Qin, B. Tong, et al. Link Farm Spam Detection Based on Its Properties[C]//Proceedings of the 2008 International Conference on Calculational Intelligence and Security. 2008: 477-480.
[10] Allan Borodin, Gareth O. Roberts, Jeffrey S. Rosenthal, et al. Finding authorities and hubs from link structures on the World Wide Web[C]//Proceedings of the 10th International Conference on World Wide Web, May 01-05, 2001: 415-429.
[11] G. O. Roberts, J. S. Rosenthal. Downweighting tightly knit communities in World Wide Web rankings[J]. Advances and Applications in Statistics, Dec. 2003, 3(3):199-216.
[12] W. Gang, Y. Wei. A Power-Arnoldi Algorithm for Computing PageRank [J].Numeric Linear Algebra Applications. 2007, 14:521-546.
[13] Jeffrey Dean, Sanjay Ghemawat. MapReduce: simplified data processing on large clusters[C]//Procee-dings of the 6th Conference on Symposium on Opera-ting Systems Design & Implementation, San Francisco, CA, December 06-08, 2004: 10-10.
[14] S. D. Kamvar, T. H. Haveliwala, C. D. Manning et al. Exploiting the Block Structure of the Web for Computing PageRank [C]//Proceedings of the 12th International World Wide Web Conference. 2003.
[15] 刘松彬,都云程,施水才. 基于分解转移矩阵的Page-Rank迭代计算方法[J]. 中文信息学报, 2007, 21(5): 41-45.

基金

国家自然科学基金资助项目(61170230,60903139,60873243,60933005);国家863计划重点资助项目(2010AA012502,2010AA012503)
PDF(2242 KB)

548

Accesses

0

Citation

Detail

段落导航
相关文章

/