Abstract:This paper aims to verify the Zipf's law in Korean language. Firstly, the statistical distribution is investigated for two linguistic units, words and alphabets, on a massive Korean text corpus. Then the least square method is adopted to simulate the curve of rank-frequency distribution of words in Korean text. Finally, the estimation values of the parameter of Zipf's law is calculated. The experimental results show that the relationship between frequency and rank of both linguistic units falls into the Zipf's law in Korean language.
[1] Ostler N. 语言帝国:世界语言史[M]. 章璐, 梵非, 蒋哲杰, 等, 译. 上海:上海人民出版社, 2011:476. [2] 朴太秀. 朝鲜民族的语言文字[J] . 黑龙江民族丛刊, 1998(4):99-100. [3] Gelbukh A, Sidorov G. Zipf and heaps vaws' coefficients depend on language[C]//International conference on intelligent text processing and computational linguistics, Mexico City, Mexico, 2001:332-335. [4] 关毅, 王晓龙, 张凯. 现代汉语计算语言模型中语言单位的频度-频级关系[J] . 中文信息学报, 1999, 13(2):8-15. [5] 游荣彦. Zipf定律与汉字字频分布[J] . 中文信息学报, 2000, 14(3):60-65. [6] Turner K. Visualizing Zipf's law in Japanese \[EB/OL\]. http://classes. soe. ucsc. edu/cmps161/Winter12/projects/katurner/proj/paper/paper. pdf. [7] 王维兰. 现代藏语语言单位频率和频级关系的统计分析[J]. 科学技术与工程, 2004, 4(5):413-417. [8] Jayaram B D, Vidya M N. Zipf's law for Indian languages[J]. Journal of Quantitative Linguistics, 2008, 15(4):293-315. [9] Choi S W. Some statistical properties and Zipf's law in Korean text corpus[J]. Journal of Quantitative Linguistics, 2000, 7(1):19-30. [10] 田垅, 刘宗田. 最小二乘法分段直线拟合[J]. 计算机科学, 2012, 39(6):482-483.