距离加权统计语言模型及其应用

金凌,吴文虎,郑方,吴根清

PDF(375 KB)
PDF(375 KB)
中文信息学报 ›› 2001, Vol. 15 ›› Issue (6) : 48-53.

距离加权统计语言模型及其应用

  • 金凌,吴文虎,郑方,吴根清
作者信息 +

Application Of A Distance-weighted Statistical Language Model

  • JIN Ling,WU Wen-hu,ZHENG Fang,WU Gen-qing
Author information +
History +

摘要

本文在统计语言模型构造中,提出了将词间距离信息结合到N-gram统计语言模型中的思路,并称之为距离加权的关联词统计语言模型。该模型可以考虑一个句子中非相邻词之间的关系,基于“词距越近关系越密切”的原则,通过距离加权函数来引入距离信息,提高模型的预测能力。本文还将其应用到一个中文整句拼音输入法系统中。实验表明,该模型与传统的N-gram统计语言模型相比,汉字误识率有所降低,模型性能有了一定提高。

Abstract

Proposed in this paper is a novel language model based on the traditional N-gram model , where the inter-word distance information is integrated ,and therefore the model is referred to as the distance-weighted statistical language model. In this model , the relationship between disconnected words is taken into consideration. Based on the principle that closer words (in distance) have a closer relation. A distance-weighted function has been used to integrate the information so as to improve the model’s prediction ability. Compared with the original n-gram model ,the experiments results show that the proposed language model will reduce the Chinese whole sentence IME system’s word error rate.

关键词

N-gram / 关联词模型 / 距离加权 / 数据平滑

Key words

N-gram / word related / language model / distance-weighted / model smooth

引用本文

导出引用
金凌,吴文虎,郑方,吴根清. 距离加权统计语言模型及其应用. 中文信息学报. 2001, 15(6): 48-53
JIN Ling,WU Wen-hu,ZHENG Fang,WU Gen-qing. Application Of A Distance-weighted Statistical Language Model. Journal of Chinese Information Processing. 2001, 15(6): 48-53

参考文献

[1] T. R. Niesler , P. C. Woddland. Variable-length category n-gram language models. Computer Speech and Language , (13) ,1999 ,99 - 124
[2] Zhang Shuwu ,Huang taiyi. An Integrated Language Modeling with n-gram model and WA model for Speech Recognition. EuroSpeech’97 Proceedings. volume 5 ,1997 ,2699 - 2702
[3] Kneser ,Reinhard ,Hermann Ney. Improved backing-off for m-gram language modeling. In Proceedings of the IEEE International Conference on Acoustics ,Speech and Signal Processing ,volume 1 ,1995 ,181 - 184
[4] S. F. Chen ,Joshua Goodman.An Empirical Study of Smoothing Techniques for Language Modeling. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics ,1996 ,310 - 318
[5] Jin Ling , Wu Genqing , Zheng Fang , Wu Wenhu. Improved strategies for intelligent sentence input method engine system. International Symposium on Chinese Spoken Language Processing ,2000 ,247 - 250
[6] Zheng Fang ,Wu Jian ,Song Zhanjiang. Improving the syllable-synchronous network search algorithm for word decoding in continuous Chinese speech recognition. Journal of Computer Science and Technology , volume 5 ,2000 ,461 - 471
PDF(375 KB)

662

Accesses

0

Citation

Detail

段落导航
相关文章

/