本文建立了一个基于对话语音的与文本无关的说话人确认系统,它和传统的与文本无关的说话人确认系统的关键不同在于,训练及测试语音不再只包含一个人而都是对话语音,因此需要分割出属于不同说话人的语音段,以建立说话人模型和实现最终判决。文中详细介绍了高斯混合模型-背景模型(GMM-UBM)这种说话人确认系统的框架,重点讨论了基于GLR(Generalized Likelihood Ratio)距离测度的无监督语音分割算法。最终阐述的输出评分的规整方法即ZNORM(Zero Normalization)和持续时间修正,可以使确认系统的性能提高近10%。
Abstract
In this paper , a text-independent speaker verification system is proposed based on conversation. The key difference between this system and the conventional 1-speaker verification system is that the speech for training and testing is conversation. So speech segmentation based on speakers is applied to train the speakers' models and make the final decision. The GMM-UBM frame is introduced while an unsupervised speech segmentation algorithm based on GLR distance measure is emphasized. Then the normalization of score including ZNORM and duration penalty results in improvement of performance by 10%.
关键词
计算机应用 /
中文信息处理 /
对话语音 /
GLR距离测度 /
无监督语音分割
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
conversation /
GLR distance measure /
unsupervised speech segmentation
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] D. A. Reynolds , T. F. Quatieri ,R. B. Dunn. Speaker Verification Using Adapted Gaussian Mixture Models [J] . Digital Signal Processing ,2000 ,10 :19 - 41.
[2] R. Bakis ,S. Chen ,P. S. Gopalakrishnan ,et al. Transcription of Broadcast News Shows with the IBM Large Vocabulary Speech Recognition System [Z] . In : Proc. of the DARPA Speech Recognition Workshop ,Chantilly ,1997 ,67 - 72.
[3] H. Gish , M-H. Siu ,R. Rohlicek. Segregation of Speakers for Speech Recognition and Speaker Identification [A] . In : Proc. of the International Confeence on Acoustics , Speech and Signal Processing (ICASSP) [C] ,Toronto ,1991 ,873 - 876.
[4] H. Hermansky , N. Morgan. RASTA Processing of Speech [A] . IEEE Trans. on Speech and Audio Processing[C] ,1994 ,2 :587 - 589.
[5] R. Balchandran , V. Ramanujam , R. J. Mammone. Channel Estimation and Normalization by Coherent Spectral Averaging for Robust Speaker Verification [A] . In : Proc. of the European Conference on Speech Communication and Technology (EUROSPEECH) [C] ,1999 ,325 - 328.
[6] 刘鸣. Robust话者识别中统计模型的研究[D]. 中国科学技术大学硕士论文,安徽合肥,2002 ,53 - 55.
[7] A. Solomonoff , A. Mielke ,et al. Clustering Speakers by Their Voices[A] . In : Proc. of the International Conference on Acoustics , Speech and Signal Processing (ICASSP) [C] ,Seattle ,Washington ,1998 ,
757 - 760.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助(60272039);安徽省自然科学基金资助(01042205)
{{custom_fund}}