一种新的基于主题的语言模型自适应方法

PDF(256 KB)

中文信息学报 ›› 2006, Vol. 20 ›› Issue (4) : 84-89.

一种新的基于主题的语言模型自适应方法

任纪生,王作英

作者信息 +

A New Topic-Based Language Model Adaptation

REN Ji-sheng,WANG Zuo-ying

Author information +

History +

摘要

基于主题的语言模型自适应方法应尽可能提高语言模型权重系数的更新速度并降低语言模型的调用量以满足语音识别实时性要求。本文采用基于聚类的方法实现连续相邻二元词对的量化表示并以此刻画语音识别预测历史和各个文本主题中心,依据语音识别历史矢量和各个文本主题中心矢量的相似度更新语言模型权重系数并摒弃全局语言模型。同传统的基于EM算法的自适应方法相比,实验表明该方法明显提高了语音识别性能和实时性,识别错误率相对下降5.1% ,说明该方法可比较准确地判断测试内容所属文本主题。

Abstract

Topic-based language model adaptation algorithm should meet the real time need for speech recognition, this goal can be implemented through improving the updating speed of language model weighting coefficient and reducing the using of language model. In this paper, a novel quantization representation scheme for continuous adjoining bigram word pairwas proposed via clustering, then it was used to characterize the speech recognition predictive history and each text topic center. The global language model was not used in this new scheme, language model weighting coefficient was updated in terms of the similarity of predictive history vector with text topic center vector. Compared with the traditional topic adaptation method based on EM algorithm, the experiments show that it had an obvious speech recognition gain accompanied with a better efficiency. The reduction of relative recognition error rate is about 5.1%. So it was concluded that this new adaptation algorithm could more accurately identify the topic of the testing contents.

导出引用

任纪生,王作英. 一种新的基于主题的语言模型自适应方法. 中文信息学报. 2006, 20(4): 84-89

REN Ji-sheng,WANG Zuo-ying. A New Topic-Based Language Model Adaptation. Journal of Chinese Information Processing. 2006, 20(4): 84-89

参考文献

[1] 苏韬,汪俊杰,孙甲松,等. 利用梯度投影法实现语言模型的主题自适应[J]. 中文信息学报, 2003, 17 (1) : 54 - 59.
[2] Iyer R , Ostendorf M. Modeling long distance dependence in language : Topic mixtures versus dynamic cache models[J]. IEEE Transactions on Speech and Audio Processing , 1999 , 7 (1) : 30 - 39.
[3] Salton G, Buckley C. Term weighting approaches in automatic text retrieval [J]. Information Processing and Management, 1988, 5: 513 - 523.
[4] Deerwester S, Dumais S T, Furnas GW, et al. Indexing by latent semantic analysis [J]. Journal of the American Society of Information Science, 1990, 41: 391 - 407.
[5] Berry M W. Large-scale sparse singular value computations [J]. The International Journal of Supercomputer Applications, 1992, 6: 13 - 49.
[6] Ney H, Essen U, Kneser R. On structuring probabilistic dependences in stochastic language modeling [J]. Computer Speech and Language, 1994, 8: 1 - 38.
[7] 王作英. 基于段长分布的HMM语音识别模型[A]. 第二届全国汉字语音识别会议[C]. 庐山: 1989.
[8] 曲卫民,张俊林,孙乐. 基于主题的汉语语言模型研究[J]. 计算机研究与发展, 2003, 40 ( 9) : 1368 - 1374.

基金

国家863计划资助项目(2001AA114071)

PDF(256 KB)

735

Accesses

Citation

Detail

段落导航

摘要
Abstract
关键词
Key words
引用本文
参考文献
基金

Received	Published
2005-06-28	2006-08-15
Issue Date
2006-08-15

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注

基金