一种词义与词的混合语言模型及其应用

侯珺,王作英

PDF(234 KB)
PDF(234 KB)
中文信息学报 ›› 2001, Vol. 15 ›› Issue (6) : 8-13.

一种词义与词的混合语言模型及其应用

  • 侯珺,王作英
作者信息 +

A Hybrid Semantic and Word Based Language Model and Its Applications

  • HOU Jun,WANG Zuo-ying
Author information +
History +

摘要

本文提出了一种基于词和词义混合的统计语言模型,研究了这个模型在词义标注和汉语普通话语音识别中的性能,并且与传统的词义模型和基于词的语言模型进行了对比。这个模型比传统词义模型更准确地描述了词义和词的关系,在词义标注中具有较小的混淆度;在汉语普通话连续音识别中,这个词义模型的性能优于基于词的三元文法模型,并且需要较小的存储空间。

Abstract

A hybrid semantic and word based language model is brought forward in this paper. The performance of the model is tested in semantic tagging and Mandarin speech recognition ,and compared with t raditional N-gram and semantic language models. The hybrid model better describes the relation between semantics and words and achieves a lower perplexity in tagging corpus. In Mandarin speech recognition , this model shows a better performance and requires less memory space than the word based trigram model.

关键词

统计语言模型 / 词义模型 / 词义标注 / 语音识别

Key words

statistical language model / semantic and word based language model / semantics tagging / speech recognition

引用本文

导出引用
侯珺,王作英. 一种词义与词的混合语言模型及其应用. 中文信息学报. 2001, 15(6): 8-13
HOU Jun,WANG Zuo-ying. A Hybrid Semantic and Word Based Language Model and Its Applications. Journal of Chinese Information Processing. 2001, 15(6): 8-13

参考文献

[1] P F Brown ,V J Della Pietra ,P V de Souza ,et al. Class-based n-gram Models of Natural Language. Computational Linguistics ,1992 ,v18 (no4) :pp467 - 479
[2] G T Niedermair ,M Streit ,H Tropf . Linguistic Processing Related to Speech Understanding in SPICOS II. In :Speech Communication 9 ,1990 ,565 - 585
[3] H Tomabechi ,M Tomita . The Integration of Unification-based Syntax/Semantics and Memory-based Pragmatics for Real-Time Understanding of Noisy Continuous Speech Input . In AAAI - 88 ,1988 ,2 :724 - 728
[4] P J Heyes ,A Hauptmann ,J Carbonell ,et al. Parsing Spoken Language :a Semantic Caseframe Approach. In :COLING- 86 ,1986 ,588 - 592
[5] A N Jain ,A H Waibel. Robust Connectionist Parsing of Spoken Language. In : ICASSP - 90 ,1990 ,593 - 596
[6] 童翔,黄昌宁. 汉语真实文本的自动语义标注[硕士学位论文] . 北京,清华大学,1993
[7] 李涓子,黄昌宁,杨尔弘. 一种自组织的汉语词义排歧方法. 中文信息学报,1999 ,13 (3)
[8] M Zhang , E Chng ,H Li. Semi-Class-Based N-Gram Language Modeling for Chinese Dictation. International Symposium on Chinese Spoken Language Processing 2000 ,2000
[9] 王作英.基于段长分布的HMM语音识别模型.见:第三届全国汉字及汉语语音识别学术会议论文集,1989. 9
[10] 张建平.大词汇量自然连续语音识别中的语言模型和理解算法的研究[博士学位论文].北京,清华大学,1999

基金

国家“九八五”重大项目(人机自然语言交互技术)(985校-22-攻关-06)
PDF(234 KB)

Accesses

Citation

Detail

段落导航
相关文章

/