该文提出了一种基于自适应频率规整的鉴别性特征提取算法。该方法通过对语音频谱的各个频带的鉴别性分析及其量化结果对各个频域进行自适应的频率规整,进行非均匀子带滤波设计提取鉴别性特征;同时在噪声环境下,在特征提取前端进行了预增强处理,解决了测试语音与训练语音失配的问题,保证了特征的正确提取。实验证明,该特征原理简单,稳定性好,对语音内容不存在依赖性,有良好的抗噪性能,并且结合预增强处理是有效的,能够进一步提高辨认系统的识别率和鲁棒性。
Abstract
This paper presents a new discriminative feature based on adaptive frequency warping. Based on the discriminative analysis of the frequency components and their quantification results, this new feature is extracted by non-uniform sub-band filters designed according to the adaptive frequency warping in different frequency bands; Furthermore, in order to overcome the mismatch between training speech and testing speech under the noisy environment, we adopt pre-enhancement before the feature extraction. Through a series of controlled experiments, it is shown that the proposed feature is insensitive to the speech content and thus more discriminative and robust in comparison to the conventional Mel frequency cepstral coefficients. The experimental results demonstrate that combining pre-enhancement and proposed feature leads to noticeable improvement on speaker recognition rate and robustness.
Key wordscomputer application; Chinese information processing; speaker identification; adaptive frequency warping; discriminative feature; robustness
关键词
计算机应用 /
中文信息处理 /
说话人辨认 /
自适应频率规整 /
鉴别性特征 /
鲁棒性
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
speaker identification /
adaptive frequency warping /
discriminative feature /
robustness
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] JP Campbell. Speaker recognition: a tutorial [C]//Proc. IEEE, 1997:1437-1462.
[2] PC Pandey, SM Bhandorkar. Enhancement of alaryngeal speech using spectral subtraction [J]. Digital Signal Processing, 2002, 12 (2): 591-594.
[3] MK Hasan, S Salahuddin, MR Khan. A modified a priori SNR for speech enhancement using spectral substraction rules [J]. Signal Processing Letters, 2004, 11 (4):450-453.
[4] 包永强, 赵力, 邹采荣. 采用归一化补偿变换的与文本无关的说话人识别[J]. 声学学报,2006; 31(1): 55-60.
[5] DA Reynolds, RC Rose. Robust text-independent speaker identification using Gaussian mixture speaker models [J]. IEEE Trans. Speech and Audio Processing, 1995, 3 (1): 72-83.
[6] WN Chan, N Zheng, T Lee. Discrimination power of vocal source and vocal tract related features for speaker segmentation[J]. IEEE Trans. Audio, Speech and Language Processing, 2007, 15 (6): 1884-1892.
[7] 白俊梅,张世磊,张树武,等. 噪声环境下的鲁棒性说话人识别[J].中文信息学报.2006, 20(1): 91-97.
[8] 王金明, 张雄伟. 话者识别系统中语音特征参数的研究与仿真[J]. 系统仿真学报. 2003, 15(9): 1276-1278.
[9] KN Stevens, G Weismer. Acoustic Phonetics [J]. Acoustical Society of America, 2001,109 (1): 17-18.
[10] J Dang, K Honda. Acoustic characteristics of the piriform fossa in models and humans [J]. Acoustical Society of America. 1997, 101:456-465.
[11] T Kitamura, K Honda, H Takemoto. Individual variation of the hypopharyngeal cavities and its acoustic effects [J].Acoustical Society of America, 2005, 26 (1): 16-26.
[12] Y Ephraim, D Malah. Speech enhancement using a minimum mean square error short-time spectral amplitude estimator[J].IEEE Trans. Acoustics, Speech and Signal Processing. 1984, 32 (6): 1109-1121.
[13] 李晔,王童,崔慧娟,等. 一种低信噪比语音的增强算法[J]. 电子与信息学报, 2007,29 (9): 2054-2057.
[14] BL Sim, YC Tong, JS Chang. A parametric formulation of the generalized spectral subtraction method [J].IEEE Trans. Speech and Audio Processing, 1998,6 (4): 328-337.
[15] C Miyajima, H Watanable, K Tokuda etc. A new approach to designing a feature extractor in speaker identification based on discriminative feature extraction [J].Speech Communication, 2001, 35 (3):203-218.
[16] S Hayakawa, F Itakura. Text-dependent speaker recognition using the information in the higher frequency band [C]//Acoustic, Speech and Signal Processing, 1994, ICASSP, 1994:19-22.
[17] Xugang Lu, Jianwu Dang. An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification [J].Speech Communication, 2008, 50: 312-322.
[18] A Varga, HJM Steeneken, M Tomlinson etc. The NOISEX-92 study on the effect of addictive noise on automatic speech recognition [R]. Technical Report, Speech Research Unit, Defense Research Agency, Malvern UK.1992.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
浙江省教育厅科研资助项目(Y200805349)
{{custom_fund}}