现代汉语计算语言模型中语言单位的频度—频级关系

关毅,王晓龙,张凯

PDF(431 KB)
PDF(431 KB)
中文信息学报 ›› 1999, Vol. 13 ›› Issue (2) : 9-16.
综述

现代汉语计算语言模型中语言单位的频度—频级关系

  • 关毅,王晓龙,张凯
作者信息 +

The Frequency-Rank Relation of Language Units in Modern Chinese Computational Language Model

  • Guan Yi , Wang Xiaolong , Zhang Kai
Author information +
History +

摘要

Zipf定律是一个反映英文单词词频分布情况的普适性统计规律。我们通过实验发现,在现代汉语的字、词、二元对等等语言单位上,其频度与频级的关系也近似地遵循Zipf定律,说明了Zipf定律对于汉语的不同层次的语言单位也是普遍适用的。本文通过实验证实了Zipf定律所反映的汉语语言单位频度-频级关系,并进而深入讨论了它对于汉语自然语言处理的各项技术,尤其是建立现代汉语基于统计的计算语言模型所具有的重要指导意义。

Abstract

Zipf's law has been widely researched by the linguists and statisticians. The frequency of English words is the most famous example of Zipf's law . In this paper ,by means of experiments ,we show that Zipf's law is also available in many language structures of Chinese (Chinese character , Chinese word , Chinese word bigram , etc) ,And Zipf's law has great effect on many technologies of Chinese language processing , especially the construction of Chinese computational language model.

关键词

Zipf定律 / 字频 / 词频 / 二元对频度

Key words

Zipf's law / Chinese character frequency / Chinese word frequency / Chinese bigram frequency

引用本文

导出引用
关毅,王晓龙,张凯. 现代汉语计算语言模型中语言单位的频度—频级关系. 中文信息学报. 1999, 13(2): 9-16
Guan Yi , Wang Xiaolong , Zhang Kai. The Frequency-Rank Relation of Language Units in Modern Chinese Computational Language Model. Journal of Chinese Information Processing. 1999, 13(2): 9-16

参考文献

[1] G. K. Zipf ,Human Behavior and the Principle of least Effort (Addison - Wesley ,1949)
[2] B. Mandelbrot ,An informational theory of the statistical structure of languages ,in Communication Theory ,ed. W. Jackson (Betterworth ,1953) ,pp. 486 - 502
[3] G. A. Miller ,E.B. Newman ,Tests of a statistical explanation of the rank-frequency relation for words in written English ,American Journal of Psychology ,71 ,209 - 218 (1958)
[4] J.Cooke , S.Gregor ,J.Luck ,J.L.Clark ,K.T.Lua ,J.McCallum ,Analyzing the conformance of Chinese text to Zipf's law and Automatic indexing of natural language text in the UNIX environment (1996 Univ of Central Queensland ,Australia)
[5] W. Li ,Random texts exhibit Zipf's-law-like word frequency distribution , IEEE Transactions on Information Theory , 38 (6) ,1842 - 1845 (1992)
[6] Richard Perline ,Zipf's law ,the central limit theorem ,and the random division of the unit interval , Physical Review E ,54 (1) ,220 - 223 (1996)
PDF(431 KB)

Accesses

Citation

Detail

段落导航
相关文章

/