>倒谱形状规整在噪声鲁棒性语音识别中的应用

杜 俊,戴礼荣,王仁华

PDF(1328 KB)
PDF(1328 KB)
中文信息学报 ›› 2010, Vol. 24 ›› Issue (2) : 104-110.
综述

>倒谱形状规整在噪声鲁棒性语音识别中的应用

  • 杜 俊,戴礼荣,王仁华
作者信息 +

Cepstral Shape Normalization (CSN) for Robust Speech Recognition

  • DU Jun, DAI Lirong, WANG Renhua
Author information +
History +

摘要

该文提出了一种新的用于鲁棒性语音识别的特征规整方法。我们观察到在噪声环境下语音特征分布的形状相比于干净环境变化很大,因此提出了一种称为倒谱形状规整的新方法,它是利用引入一个指数因子来达到对倒谱分布形状进行规整的目的。这种方法被证明在噪声环境下非常有效,特别是在低信噪比情况下。实验结果表明此新方法在aurora2和aurora3两个标准数据库上比经典的均值方差规整算法在词错误率方面分别有38%和25%的相对降低,并且倒谱形状规整也好于其它传统方法,比如直方图均衡和高阶倒谱矩规整方法。

Abstract

In this paper, we propose a new feature normalization approach for robust speech recognition. It is revealed that the shape of speech feature distributions is changed in noisy environments compared with that in the uninterrupted condition. Accordingly, the Cepstral Shape Normalization (CSN) which normalizes the shape of feature distributions is performed by exploiting an exponential factor. This method has been proven effective in noisy environments, especially under low SNRs. Experimental results show that the proposed method yields relative word error rate reductions of 38% and 25% on aurora2 and aurora3 databases, respectively, in comparing with those of the conventional Mean and Variance Normalization (MVN). It is also shown that CSN consistently outperforms other traditional methods, such as Histogram EQualization (HEQ) and Higher Order Cepstral Moment Normalization (HOCMN).
Key wordscomputer application; Chinese information processing; robust speech recognition; shape normalization

关键词

计算机应用 / 中文信息处理 / 鲁棒性语音识别 / 形状规整

Key words

computer application / Chinese information processing / robust speech recognition / shape normalization

引用本文

导出引用
杜 俊,戴礼荣,王仁华. >倒谱形状规整在噪声鲁棒性语音识别中的应用. 中文信息学报. 2010, 24(2): 104-110
DU Jun, DAI Lirong, WANG Renhua. Cepstral Shape Normalization (CSN) for Robust Speech Recognition. Journal of Chinese Information Processing. 2010, 24(2): 104-110

参考文献

[1] 丁沛, 曹志刚. 基于语音增强失真补偿的抗噪声语音识别技术[J]. 中文信息学报, 2004, 18(5):64-69.
[2] Y. Gong. Speech Recognition in Noisy Environments: A Survey [J]. Speech Communication, 1995, 16(3): 261-291.
[3] O. Viikki and K. Laurila. Cepstral Domain Segmental Feature Vector Normalization for Noise Robust Speech Recognition [J]. Speech Communication, 1998, 25(1): 133-147.
[4] C.-W. Hsu and L.-S. Lee. Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition [C]//IEEE Proc. of ICASSP, 2004: 197-200.
[5] B. Liu, L.-R. Dai, J.-Y. Li and R.-H. Wang. Double Gaussian Based Feature Normalization for Robust Speech Recognition [C]//Proc. of ISCSLP, 2004, 253-256.
[6] A. de la Torre, J.C. Segura, C. Benitez, A.M. Peinado and A.J. Rubio. Non-linear Transformations of the Feature Space for Robust Speech Recognition[C]//
IEEE Proc. of ICASSP, 2002: 401-404.
[7] F. Hilger and H. Ney. Quantile Based Histogram Equalization for Noise Robust Speech Recognition [C]//Proc. of EUROSPEECH, 2001: 1135-1138.
[8] S.-N. Tsai and L.-S. Lee. A New Feature Extraction Front-End for Robust Speech Recognition using Progressive Histogram Equalization and Multi- Eigenvector Temporal Filtering [C]//Proc. of ICSLP, 2004: 165-168.
[9] S.-H. Lin, Y.-M. Yeh and B. Chen. Exploiting Polynomial-fit Histogram Equalization and Temporal Average for Robust Speech Recognition [C]//Proc. of ICSLP, 2006, 2522-2525.
[10] S. Gazor and W. Zhang. Speech Probability Distribution [J]. IEEE Signal Processing Letters, 2003, 10(7): 204-207.
[11] J.W. Shin, J.-H. Chang and N.S. Kim. Statistical Modeling of Speech Signals Based on Generalized Gamma Distribution [J]. IEEE Signal Processing Letters, 2005, 12(3): 258-261.
[12] K. Kokkinakis and A.K. Nandi. Speech Modelling Based on Generalized Gaussian Probability Density Functions [C]//IEEE Proc. of ICASSP, 2005: 381-384.
[13] C.-P. Chen, J. Bilmes and K. Kirchhoff. Low-Resource Noise-robust Feature Post-processing on Aurora2.0[C]//Proc.of ICSLP, 2002: 2445-2448.
[14] H.G. Hirsch and D. Pearce. The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems under Noisy Conditions [C]//Proc. of ISCA ITRW ASR, 2000: 181-188.
[15] A. Moreno, et al. SpeechDat-Car: A Large Speech Database for Automotive Environments [C]//Proc. of LREC, 2000: 373-378.
PDF(1328 KB)

488

Accesses

0

Citation

Detail

段落导航
相关文章

/