本文在统计语言模型构造中,提出了将词间距离信息结合到N-gram统计语言模型中的思路,并称之为距离加权的关联词统计语言模型。该模型可以考虑一个句子中非相邻词之间的关系,基于“词距越近关系越密切”的原则,通过距离加权函数来引入距离信息,提高模型的预测能力。本文还将其应用到一个中文整句拼音输入法系统中。实验表明,该模型与传统的N-gram统计语言模型相比,汉字误识率有所降低,模型性能有了一定提高。
Abstract
Proposed in this paper is a novel language model based on the traditional N-gram model , where the inter-word distance information is integrated ,and therefore the model is referred to as the distance-weighted statistical language model. In this model , the relationship between disconnected words is taken into consideration. Based on the principle that closer words (in distance) have a closer relation. A distance-weighted function has been used to integrate the information so as to improve the model’s prediction ability. Compared with the original n-gram model ,the experiments results show that the proposed language model will reduce the Chinese whole sentence IME system’s word error rate.
关键词
N-gram /
关联词模型 /
距离加权 /
数据平滑
{{custom_keyword}} /
Key words
N-gram /
word related /
language model /
distance-weighted /
model smooth
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] T. R. Niesler , P. C. Woddland. Variable-length category n-gram language models. Computer Speech and Language , (13) ,1999 ,99 - 124
[2] Zhang Shuwu ,Huang taiyi. An Integrated Language Modeling with n-gram model and WA model for Speech Recognition. EuroSpeech’97 Proceedings. volume 5 ,1997 ,2699 - 2702
[3] Kneser ,Reinhard ,Hermann Ney. Improved backing-off for m-gram language modeling. In Proceedings of the IEEE International Conference on Acoustics ,Speech and Signal Processing ,volume 1 ,1995 ,181 - 184
[4] S. F. Chen ,Joshua Goodman.An Empirical Study of Smoothing Techniques for Language Modeling. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics ,1996 ,310 - 318
[5] Jin Ling , Wu Genqing , Zheng Fang , Wu Wenhu. Improved strategies for intelligent sentence input method engine system. International Symposium on Chinese Spoken Language Processing ,2000 ,247 - 250
[6] Zheng Fang ,Wu Jian ,Song Zhanjiang. Improving the syllable-synchronous network search algorithm for word decoding in continuous Chinese speech recognition. Journal of Computer Science and Technology , volume 5 ,2000 ,461 - 471
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}