Abstract:Researches show that pronunciation differences between the speakers can cause serious effects on the Uyghur speech recognition system. Focused on the speaker adaptation technology,this paper applies MLLR,MAP and MLLR+MAP methods to the training of acoustic models of Uyghur Continuous Speech Recognition system. Experimental results show that with the three speaker adaptation methods,the word error rate is reduced by 0.6%,2.34% and 2.57%,respectively.
[1] 努尔麦麦提·尤鲁瓦斯,吾守尔·斯拉木.面向大词汇量的维吾尔语连续语音识别研究[J].计算机工程与应用,2013,49(9): 115-119. [2] 那斯尔江·吐尔逊,吾守尔·斯拉木.基于隐马尔可夫模型的维吾尔语连续语音识别系统[J].计算机应用,2009,29(7): 2009-2012. [3] C HLee,C HLin,B HJuang. A study on speaker adaptation of the parameters of continuous density hidden Markov models[J]. IEEE Trans.on Acoustic and Speech Signal Processing.1991,39 (4): 806-814. [4] C J Leggetter. Improved acoustic modeling for HMMs using linear transformations[D]. Cambridge University,1995. [5] 李虎生,刘加,刘润生语音识别说话人自适应研究现状及发展趋势[J].电子学报,2003,31(1): 103-108. [6] 罗骏,欧智坚,王作英.说话人自适应训练方法在连续语音识别中的应用[J].中文信息学报,2004,18(3): 61-65. [7] C J Leggetter,P C Woodland.Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models[J].Computer Speech and Language,1995,9 (2): 171-185. [8] J L Gauvain,C HLee.Maximum a posteriori estimation for multivariate Gaussian observations[J]. IEEE Trans. on Speech and Audio Processing,1994,2 (2): 291-298. [9] de la Torre A,Segura J C.Non-linear transformations of the feature space for robust speech recognition[C]//Proceedings of the ICASSP,2002: 401-404. [10] Steve Young,etc.The HTK Book(for HTK Version 3.4)[R].Cambridge University Engineering Department.2006,12. [11] A.Stolcke.SRILM-An Extensible Language Modeling Toolkit[C]//Proceedings of the Conference on Spoken Language Processing,2002,901-904. [12] G Zavaliagkost,R Schwatz,J Makhoul. Batch,incremental,and instantaneous adaptation techniques for speech recognition[C]//Proceedings of the ICASSP.1995. [13] 张金槐,唐雪梅. BAYES 方法[M]. 长沙: 国防科技大学出版社,1993. [14] R O Duda,P E Hart. Pattern Classification and Scene Analysis [M]. New York: John Wiley,1973.