基于特征参数归一化的鲁棒语音识别方法综述

肖云鹏, 叶卫平

PDF(2593 KB)
PDF(2593 KB)
中文信息学报 ›› 2010, Vol. 24 ›› Issue (5) : 106-117.
综述

基于特征参数归一化的鲁棒语音识别方法综述

  • 肖云鹏, 叶卫平
作者信息 +

Survey of Feature Normalization Techniques for Robust Speech Recognition

  • XIAO Yunpeng, YE Weiping
Author information +
History +

摘要

目前,自动语音识别系统往往会因为环境中复杂因素的影响,造成训练环境和测试环境存在不匹配现象,使得识别系统性能大幅度下降,极大地限制了语音识别技术的应用范围。近年来,很多鲁棒语音识别技术成功地被提出,这些技术的目标都是相同的,主要是提高系统的鲁棒性,进而提高识别率。其中,基于特征的归一化技术简单而有效,常常被作为鲁棒语音识别的首选方法,它主要是通过对特征向量的统计属性、累积密度函数或功率谱的归一化来补偿环境不匹配产生的影响。该文主要对目前主流的归一化方法进行介绍,其中包括倒谱矩归一化方法、直方图均衡化方法以及调频谱归一化方法等。

Abstract

The performance of current automatic speech recognition (ASR) systems often deteriorates radically when the input speech is corrupted by various kinds of noise sources. Such performance degradation is mainly caused by mismatch between the training and recognition environments. Quite a few techniques have been proposed to reduce this mismatch over the past several years. Some of the techniques, like feature-based normalization, are generally simple yet powerful to provide robustness against several forms of signal degradation. So normalization strategies are often chosen as the preferred method for speech robustness. They are employed by normalizing the statistical properties (moment), cumulative density function or power spectral density (PSD) of feature vector to compensate for the effects of environmental mismatch. In this paper, most commonly used feature normalization methods are reviewed, such as cepstral moment normalization, histogram equalization technique (HEQ) and Modulation Spectrum Normalization etc.
Key wordsrobust speech recognition; cepstral mean normalization; high order cepstral moment normalization; histogram equalization; cepstral shape normalization

关键词

鲁棒语音识别 / 倒谱均值归一化 / 高阶倒谱矩归一化 / 直方图均衡化 / 倒谱形状归一化

Key words

robust speech recognition / cepstral mean normalization / high order cepstral moment normalization / histogram equalization / cepstral shape normalization

引用本文

导出引用
肖云鹏, 叶卫平. 基于特征参数归一化的鲁棒语音识别方法综述. 中文信息学报. 2010, 24(5): 106-117
XIAO Yunpeng, YE Weiping. Survey of Feature Normalization Techniques for Robust Speech Recognition. Journal of Chinese Information Processing. 2010, 24(5): 106-117

参考文献

[1] Y. F. Gong. Speech recognition in noisy environments: A survey [J]. Speech Communication, 1995, 16: 261-291.
[2] S. Boll. Suppression of acoustic noise in speech using spectral subtraction [J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1979, 27 (2): 113-120.In: Proceedings of IEEE International Conference on Acoustics, Acoustics and Signal Processing
[3] K. Paliwal and A. Basu. A speech enhancement method based on Kalman filtering [C]//Proceedings of 1987 IEEE International Conference on Acoustics, Acoustics and Signal Processing. Dallas, Texas, USA,1987:177-180.
[4] Y. Ephraim and H. L. Van Trees. A signal subspace approach for speech enhancement [C]//Proceedings of 1993 IEEE International Conference on Acoustics, Acoustics and Signal Processing. Minneapolis, MN, USA,1993: 355-358.
[5] H. Lev-Ari, Y. Ephraim. Extension of the signal subspace speech enhancement approach to colored noise [J]. IEEE Signal Processing Letters, 2003, 10 (4): 104-106.
[6] S. Furui. Cepstral analysis technique for automatic speaker verification [J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1981,29(2): 254-272.
[7] O. Viikki and K. Laurila. Cepstral Domain Segmental Feature Vector Normalization for Noise Robust Speech Recognition [J]. Speech Communication, 1998,25:133-147.
[8] A. de la Torre, A. M. Peinado, J. C. Segura et al. Histogram equalization of speech representation for robust speech recognition [J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 2005,13(3):355-366.
[9] S. H. Lin, Y. M. Yeh, and B. Chen. A Comparative Study of HEQ for Robust speech recognition [J]. International Journal of Computational Linguistics and Chinese Language Processing, 2007, 12 (2): 217-238.
[10] J. L. Gauvain and C. H. Lee. Maximum a posteriori estimation for multivariate Gaussian mixtureobservations of Markov chains [J]. IEEE Transactions on Speech and Audio Processing, 1994, 2 (2): 291-298.
[11] C. J. Leggetter and P. C. Woodland. Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models [J]. Computer Speech and Language, 1995, 9 (4): 806-814.
[12] J. Droppo. Noise Robust Automatic Speech Recognition[DB/OL].http://www.e eurasip.org/Proceedings//Eusipco/Eusipco2008/tutorials/tutorial_3_droppo.pdf, 2008-08-15.
[13] R. Togneri, A. M. Toh and S. Nordholm. Evaluation and Modification of Cepstral Moment Normalization for Speech Recognition in Additibe Babble Ensemble [C]//Proceedings of the 11th Australian International Conference on Speech Science & Technology. New Zealand,2006: 94-99.
[14] H.G. Hirsch and D. Pearce. The Aurora Experimental Framework for the Performance Evaluation of Speech recognition [C]//Proceedings of ISCA ITRW ASR2000. Paris, France,2000: 181-188.
[15] A. Acero and X. Huang. Augmented Cepstral Normalization for Robust Speech Recognition [C]//Proc. of IEEE Automatic Speech Recognition Workshop. Snowbird, Utah, USA: 1995.
[16] P. Jain and H. Hermansky. Improved mean and variance normalization for robust speech recognition [C]//Proceedings of 2001 IEEE International Conference on Acoustics, Acoustics and Signal Processing . Salt Lake City, Utah,USA: 2001.
[17] C. W. Hsu and L. S. Lee. Higher order cepstral moment normalization (HOCMN) for robust speech recognition [C]//Proceedings of 2004 IEEE International Conference on Acoustics, Acoustics and Signal Processing. Montreal, Canada: 2004: 197-200.
[18] Y. H. Suk, S. H. Choi and H. S. Lee. Cepstrum third-order normalisation method for noisy speech recognition [J]. IEEE Electronics Letters, 35(7): 527-528.
[19] S. Dharanipragada and M. Padmanabhan. A nonlinear unsupervised adaptation technique for speech recognition [C]//Proceedings of The 6th International Conference on Spoken Language Processing. Beijing, China,2000: 556-559.
[20] A. de la Torre, J. C. Segura, C. Benitez et al. Non-linear transformations of the feature space for robust speech recognition [C]//Proceedings of 2002 IEEE International Conference on Acoustics, Acoustics and Signal Processing. Orlando, FL, USA,2002: 401-404.
[21] S. Molau, D. Keysers and H. Ney. Matching training and test data distributions for robust speech recognition [J]. Speech Communication, 2003, 41(4): 579-601.
[22] C. Y. Wan and L. S. Lee. Joint Uncertainty Decoding (JUD) with Histogram-Based Quantization (HQ) for Robust and/or Distributed Speech Recognition [C]//Proceedings of 2006 IEEE International Conference on Acoustics, Acoustics and Signal Processing. Toulouse, France,2006: 125-128.
[23] C. Y. Wan and L. S. Lee. Histogram-based quantization (HQ) for robust and scalable distributed speech recognition [C]//Proceeding of 9th European Conference on Speech Communication and Technology. Lisbon, Portugal,2005: 957-960.
[24] M. Skosan and D. Mashao. Matching feature distributions for robust speaker verification [C]//Proceedings of Annual Symposium of Pattern Recognition Association of South Africa. Grabouw, South Africa,2004: 93-97.
[25] F. Hilger and H. Ney. Quantile Based Histogram Equalization for Noise Robust Speech Recognition [C]//Proceedings of the 7th European Conference on Speech Communication and Technology. Aalborg, Denmark,2001: 1135-1138.
[26] F. Hilger, S. Molau and H. Ney. Quantile Based Histogram Equalization For Online Applications [C]//Proceedings of the 7th International Conference on Spoken Language Processing. Denver, Colorado, USA,2002: 237-240.
[27] F. Hilger and H. Ney. Quantile based histogram equalization for noise robust large vocabulary speech recognition [J]. IEEE Transactions on Acoustics, Speech and Signal Processing,2006,14(3):845-854.
[28] J. Du and R. H. Wang. Cepstral shape normalization (CSN) for robust speech recognition [C]//Proceedings of 2008 IEEE International Conference on Acoustics, Acoustics and Signal Processing. Las Vegas, NV, USA,2008: 4389-4392.
[29] S. Gazor and W. Zhang. Speech probability distribution [J]. IEEE Signal Processing Letters, 2003, 10 (7): 204-207.
[30] K. Kokkinakis and A. K. Nandi. Speech Modelling Based On Generalized Gaussian Probability Density Functions [C]//Proceedings of 2005 IEEE International Conference on Acoustics, Acoustics and Signal Processing. Philadelphia, USA,2005: 381-384.
[31] B. Liu, L. R. Dai et al. Double Gaussian based feature normalization for robust speech recognition [C]//Proceedings of 4th International Symposium on Chinese Spoken Language Processing. Hong Kong, China,2004: 253-256.
[32] T. Houtgast and H. J. M. Steeneken. A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria [J]. The Journal of the Acoustical Society of America, 1985, 77 (3): 1069-1077.
[33] X. Xiao, E. S. Chng and H. Li. Temporal Structure Normalization of Speech Feature for Robust Speech Recognition [J]. IEEE Signal Processing Letters, 2007, 14 (7): 500-503.
[34] X. Xiao, E. S. Chng and H. Li. Normalizing the speech modulation spectrum for robust speech recognition [C]//Proceedings of 2007 IEEE International Conference on Acoustics, Acoustics and Signal Processing. Honolulu, HI, USA,2007: 1520-6149.
[35] C. A. Pan, C. C. Wang and J. W. Hung. Improved modulation spectrum normalization techniques for robust speech recognition [C]//Proceedings of 2008 IEEE International Conference on Acoustics, Acoustics and Signal Processing. Las Vegas, NV, USA,2008: 4089-4092.
[36] M. Matassoni, M. Omologoand and D. Giuliani. Hands-free speech recognition using a filtered clean corpus and incremental HMM adaptation [C]//Proceedings of 2000 IEEE International Conference on Acoustics, Acoustics and Signal Processing. Istanbul, Turkey,2000: 1407-1410.
[37] M.G. Rahimand and B.H. Juang. Signal bias removal by maximum likelihood estimation for robust telephone speech recognition [J]. IEEE Transactions on Speech and Audio Processing, 1996,4(1):19-30.
[38] J. Droppo, A. Acero and L. Deng. Uncertainty decoding with SPLICE for noise robust speech recognition [C]//Proceedings of 2002 IEEE International Conference on Acoustics, Acoustics and Signal Processing. Orlando, Florida,2002: 57-60.
[39] H. Liao and M. J. F. Gales. Joint uncertainty decoding for noise robust speech recognition [C]//Proceedings of The 9th European Conference on Speech Communciation and Technology. Lisbon, Portugal,2005: 3129-3132.
[40] H. Liao and M.J.F. Gales. Issues with uncertainty decoding for noise robust automatic speech recognition [J]. Speech Communication, 2008, 50 (4): 265-277.
[41] V. Stouten, H. Van hammeand and P. Wambacq. Model-based feature enhancement with uncertainty decoding for noise robust ASR [J]. Speech Communication, 2006, 48 (11): 502-1514.

PDF(2593 KB)

Accesses

Citation

Detail

段落导航
相关文章

/