1. School of Information Science and Technology, Southwest Jiaotong University, Chengdu, Sichuan 611756, China; 2. Research Center of Tibetan Information Technology Department of Computer Science,Tibetan University, Tibetan, Lhasa 850000, China
Zipfs Law has been applied widely in many fields as an important rule in bibliometrics. Webometrics has received much attention with the accelerated explosion of network information nowadays. We suggest that Zipfs Law may exist in webometrics in the distribution of search result. We select the public word set and conduct experiments on several popular search engines. The experimental results confirm that the numbers of search results roughly conform to Zipfs Law. The Zipfs index of the numbers of search results of Baidu and So is 0.003.
LIU Shengjiu, LI Tianrui, ZHU Jie.
Zipfs Law and Webometrics. Journal of Chinese Information Processing. 2015, 29(4): 89-94
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1]关毅, 王晓龙, 张凯. 现代汉语计算语言模型中语言单位的频度-频级关系[J]. 中文信息学报, 1999, 13(2): 8-15. [2] 游荣彦. Zipf 定律与汉字字频分布. 中文信息学报[J], 2000, 14(3): 60-65. [3] 王洋, 刘宇凡, 陈清华. 汉语言文学作品中词频的Zipf分布[J]. 北京师范大学学报(自然科学版), 2009, 45(4): 424-427. [4] Jayaram B D, Vidya M N. Zipfs law for Indian languages[J]. Journal of Quantitative Linguistics, 2008, 15(4): 293-17. [5] Tuzzi A, Popescu I I, Altmann G. Zipfs laws in Italian Texts[J].Journal of Quantitative Linguistics, 2009, 16(4): 354-367. [6] Alexander G, Grigori S. Zipf and Heaps Laws Coefficients Depend on Language[C]//Proceedings of the CICLing-2001, Mexico City, Mexico, 2001: 332-335. [7] 韩定定, 马余刚. 原子核碎裂中可能存在Zipf定律[J]. 科学通报, 2000, 45: 913-918. [8] Kali R. The city as a giant component: a random graph approach to Zipfs law[J]. Applied Economics Letters, 2003, 10(11): 717-720(4). [9] 李玉鑑, 肖创柏. 蛋白质序列中可能存在的Zipf定律[J]. 北京工业大学学报, 2005, 31(4): 366-368. [10] 曹盼盼, 阎春宁. 人类通信模式的幂律分布和Zipf定律[J]. 复杂系统与复杂性科学, 2009, 6(4): 51-56. [11] 王德进, 张社英, 刘源. 汉语言的几个统计规律[J]. 中文信息学报, 1987, 1(4): 33-39. [12] 郑亚斌, 刘知远, 孙茂松. 中文歌词的统计特征及其检索应用[J]. 中文信息学报. 2007, 21(5): 61-67. [13] 刘宇凡, 郭金忠, 陈清华. 唐代以来汉语文学作品中的字频演变[J]. 中文信息学报. 2011, 25(3): 93-97. [14] Stanley M, Buldyrev S, Havlin S. Zipfs plots and the size distribution of firms[J]. Economics Letters, 1995, 49: 453-457. [15] Bruce M H. Zipfs law and prior distributions for the composition of a population[J]. Journal of the American Statistical Association, 1970, 65: 1220-1232. [16] Sornette D, Knopoff L, Kagan Y Y. Rank- ordering statistics of extreme events: Application to the distribution of large earthquakes[J]. Journal of Geophysical Research, 1996, 101(B6): 13883-13894. [17] Han D D. Scale-free download network for publications, Chinese Physics Letter, 2004, 21: 1855-1857. [18] Sornette D, Zajdenweber D. Economic returns of research: the Pareto law and its implications[J]. European Physical Journal B, 1998, 8: 653-664. [19] Cai Biao, Chen Liangyin. Zipfs Trust Discovery in Structured P2P Network[C]//Proceedings of the WKDD2010, 2010: 191-194. [20] Hamoud M, Merouani H F. Detection of a Region of Interest in the Images Based on Zipf Laws[C]//Proceedings of the SITIS2011, 2011: 416-421. [21] 薛飞. 中国城市规模的Zipf 法则检验及其影响因素[D]. 厦门: 厦门大学硕士学位论文, 2007. [22] Almind T C, Lngwersen P. Informetric analyses on the World Wide Web: Methodological Approaches to “webometrics”[J]. Joumal of Documentation, 1997, 53(4): 404-426. [23] Shi Longqing, Zhao Qingfeng. Data Sources of Webometrics[C]//Proceedings of the CIS2011, 2011: 1312-1315. [24] 李静静, 闫宏飞. 中文网页信息检索测试集的构建、分析及应用[J]. 中文信息学报. 2008, 22(1): 30-36. [25] Elgharabawy M A, Ayu M A. Web content accessibility and its relation to Webometrics ranking and search engines optimization[C]//Proceedings of the ICRIIS2011, 2011: 1-6. [26] 何宇, 赵洪利, 杨海涛, 赵东杰. 复杂网络演化研究综述[J]. 装备指挥技术学院学报, 2011, 11(2): 120-125. [27] 刘胜久, 李天瑞, 贾真, 尹红风. 元搜索引擎排序方法建模与算法研究[J]. 计算机科学, 2012, 39(11A): 197-199. [28] 张伟哲, 张宏莉, 许笑, 何慧. 分布式搜索引擎系统效能建模与评价[J]. 软件学报, 2012, 23(2): 253-265. [29] 王继民, 彭波. 搜索引擎用户点击行为分析[J]. 情报学报, 2006, 25(2): 154-162. [30] 刘奕群, 岑荣伟, 张敏, 茹立云, 马少平. 基于用户行为分析的搜索引擎自动性能评价[J]. 软件学报, 2008, 19(11): 3023-3032. [31] 余慧佳, 刘奕群, 张敏, 茹立云 ,马少平. 基于大规模日志分析的搜索引擎用户行为分析[J]. 中文信息学报, 2007, 21(1): 109-114. [32] 岑荣伟, 刘奕群, 张敏, 茹立云, 马少平. 基于日志挖掘的搜索引擎用户行为分析[J]. 中文信息学报, 2010, 24(3): 49-54. [33] Simon H A. On a class of skew distribution functions[J]. Biometrika, 1955, 42: 425-440. [34] 姜志宏, 王晖, 高超. 一种基于随机行走和策略连接的网络演化模型[J]. 物理学报, 2011, 60(5): 818-826.