本文提出了一种基于词和词义混合的统计语言模型,研究了这个模型在词义标注和汉语普通话语音识别中的性能,并且与传统的词义模型和基于词的语言模型进行了对比。这个模型比传统词义模型更准确地描述了词义和词的关系,在词义标注中具有较小的混淆度;在汉语普通话连续音识别中,这个词义模型的性能优于基于词的三元文法模型,并且需要较小的存储空间。
Abstract
A hybrid semantic and word based language model is brought forward in this paper. The performance of the model is tested in semantic tagging and Mandarin speech recognition ,and compared with t raditional N-gram and semantic language models. The hybrid model better describes the relation between semantics and words and achieves a lower perplexity in tagging corpus. In Mandarin speech recognition , this model shows a better performance and requires less memory space than the word based trigram model.
关键词
统计语言模型 /
词义模型 /
词义标注 /
语音识别
{{custom_keyword}} /
Key words
statistical language model /
semantic and word based language model /
semantics tagging /
speech recognition
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] P F Brown ,V J Della Pietra ,P V de Souza ,et al. Class-based n-gram Models of Natural Language. Computational Linguistics ,1992 ,v18 (no4) :pp467 - 479
[2] G T Niedermair ,M Streit ,H Tropf . Linguistic Processing Related to Speech Understanding in SPICOS II. In :Speech Communication 9 ,1990 ,565 - 585
[3] H Tomabechi ,M Tomita . The Integration of Unification-based Syntax/Semantics and Memory-based Pragmatics for Real-Time Understanding of Noisy Continuous Speech Input . In AAAI - 88 ,1988 ,2 :724 - 728
[4] P J Heyes ,A Hauptmann ,J Carbonell ,et al. Parsing Spoken Language :a Semantic Caseframe Approach. In :COLING- 86 ,1986 ,588 - 592
[5] A N Jain ,A H Waibel. Robust Connectionist Parsing of Spoken Language. In : ICASSP - 90 ,1990 ,593 - 596
[6] 童翔,黄昌宁. 汉语真实文本的自动语义标注[硕士学位论文] . 北京,清华大学,1993
[7] 李涓子,黄昌宁,杨尔弘. 一种自组织的汉语词义排歧方法. 中文信息学报,1999 ,13 (3)
[8] M Zhang , E Chng ,H Li. Semi-Class-Based N-Gram Language Modeling for Chinese Dictation. International Symposium on Chinese Spoken Language Processing 2000 ,2000
[9] 王作英.基于段长分布的HMM语音识别模型.见:第三届全国汉字及汉语语音识别学术会议论文集,1989. 9
[10] 张建平.大词汇量自然连续语音识别中的语言模型和理解算法的研究[博士学位论文].北京,清华大学,1999
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家“九八五”重大项目(人机自然语言交互技术)(985校-22-攻关-06)
{{custom_fund}}