在自动语种识别中,测试语音中说话人和信道的差异,会对系统性能产生很大的影响。针对于此,该文通过引入因子分析技术,根据语种识别的特点,建立了描述该差异 (说话人差异和信道差异)的子空间的数学模型,并分别从特征域和模型域两个方面尝试消除该差异的影响。在最新的NIST LRE2007的测试任务中,相对于GMM-UBM基线系统,该文方法有效地提高了系统识别性能。在30s时长的测试中,等错误率(EER)相对降低36.5%。
Abstract
In language identification system, the performance is substantially affected by the session variability including speaker variability; channel variability etc. In this paper, factor analysis is introduced to estimate the session variability subspace. According to the characteristics of the language identification task, the statistical model construction algorithm is discussed. Finally, both the model and the feature domain compensation methods are proposed. In NIST LRE 2007 30s test corpus, the experiment results show advantage of the proposed method, with a relative reduction in the equal error rate (EER) for about 36.5% compared with the baseline GMM-UBM system.
Key wordscomputer application; Chinese information processing; language identification; GMM model; factor analysis
关键词
计算机应用 /
中文信息处理 /
自动语种识别 /
高斯混合模型 /
因子分析
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
language identification /
GMM model /
factor analysis
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] E. Singer, P.A. Torres-Carrasquillo, T.P. Gleason, W.M.Campbell, and D.A. Reynolds. Acoustic, Phonetic, and Discriminative approaches to Automatic Language Identification [C]//Proc. Eurospeech 2003, Sept. 2003: 1345-1348.
[2] P.A. Torres-Carrasquillo, E. Singer, M.A. Kohler, R.J. Greene, D.A. Reynolds, and J.R. Deller, Jr. Approaches to language identification using Gaussian mixture models and shifted delta cepstral features[C]//Proc. ICSLP, Colorado, USA: Sept. 2002, 89-92.
[3] Patrick Kenny, G. Boulianne, P. Ouellet and P. Dumouchel. Speaker and Session Variability in GMM-Based Speaker Verification [J]. IEEE Transactions on Audio, Speech and Language Processing, May 2007, 15(4): 1448-1460.
[4] C. Vair, D. Colibro, F. Castaldo, E. Dalmasso, and P. Laface. Channel factors compensation in model and feature domain for speaker recognition [C]//Proc. IEEE Odyssey, San Juan, PR: Jun. 2006, CD-ROM.
[5] NIST 2007 LRE Plan [EB/OL], http://www.itl.nist.gov/iad/mig//tests/lang/2007.
[6] Gauvain, J.-L, Chin-Hui Lee. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains [J]. IEEE Transactions on Speech and Audio Processing, 1994, 2(2): 291-298.
[7] P. Kenny, G. Boulianne, and P. Dumouchel. Eigenvoice modeling with sparse training data [J]. IEEE Transactions on Speech Audio Processing, May 2005, 13(3): p345-354.
[8] Callfriend corpus, telephone speech of 15 different languages or dialects [DB/OL], /www.ldc.upenn.edu/Catalog.
[9] LORI F. LAMEL, LAWRENCE R. RABINER. An Improved Endpoint Detector for Isolated Word Recognition [J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, Aug 1981, 29(4): 777-785.
[10] Douglas A. Reynolds, Thomas F. Quatieri and Robert B. Dunn. Speaker verification using adapted Gaussian mixture models [J]. Digital Signal Processing, Jan. 2000, 10: 19-41.
[11] Hermansky, H., Morgan, N., Bayya, A, and Kohn, P. RASTA-PLP speech analysis technique [C]//Proc.ICASSP.1992, San Francisco, CA, USA: Mar 1992: 121-124.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}