短语音及易混淆语种识别改进系统

李卓茜,高镇,王化,刘俊南,朱光旭

PDF(3748 KB)
PDF(3748 KB)
中文信息学报 ›› 2019, Vol. 33 ›› Issue (10) : 135-142.
自然语言处理应用

短语音及易混淆语种识别改进系统

  • 李卓茜1,高镇1,王化2,刘俊南2,朱光旭2
作者信息 +

An Improved System for Short-term and Confusing Language Recognition

  • LI Zhuoxi1, GAO Zhen1, WANG Hua2, LIU Junnan2, ZHU Guangxu2
Author information +
History +

摘要

该文针对短语音(语段时长小于等于1s)和易混淆语音的语种识别进行研究。选取东方多语种识别竞赛数据集为实验数据集,对比了音素对数似然比特征、梅尔频率倒谱系数特征、深度瓶颈层特征(DBF)在短语音及易混淆语种识别中的性能,证明DBF在两种识别任务中均具有较好的性能。为提升识别准确率提出DBF-I-VECTOR语种识别改进系统,该系统分别将基线DBF-I-VECTOR系统的短语音识别等错误率最优结果从12.26%降低为10.55%,易混淆语音识别等错误率(EER)最优结果从5.53%降低为2.86%。在对比改进系统后端的余弦距离(CDS)、概率线性判别分析(PLDA)、支持向量机(SVM)、极端梯度提升(XGBoost)、随机森林(RF)分类性能时发现RF在短语音任务中分类效果最好,SVM在易混淆任务中分类效果最好。

Abstract

Focused on language recognition on short utterances (with a duration less than or equal to 1s) and confusing speech, this paper investigates the performance of phoneme log likelihood ratio feature, the Mel frequency cepstral coefficient feature, and the deep bottleneck feature (DBF) , revealing that the DBF performs best in both tasks. To further improve recognition accuracy, the paper proposes an improved DBF-I-VECTOR system which, compared with the baseline of DBF-I-VECTOR on the Oriental Multilingual Recognition Competition Data, reduces the optimal equal error rate (EER) of short-term task from 12.26% to 10.55%, and the confusing task from 5.53% to 2.86% in respectively. It is also revealed that the Random Forest (RF) has the best classification performance in short-term task, and the Support Vector Machine (SVM) has the best classification performance in confusing task when compared with Cosine Distance Scoring (CDS) , Probabilistic Discriminant Analysis (PLDA) , Extreme Gradient Boosting (XGBoost) .

关键词

短语音 / 易混淆语种 / 语种识别 / 语音特征

Key words

short utterance / confusing language / language recognition / speech feature

引用本文

导出引用
李卓茜,高镇,王化,刘俊南,朱光旭. 短语音及易混淆语种识别改进系统. 中文信息学报. 2019, 33(10): 135-142
LI Zhuoxi, GAO Zhen, WANG Hua, LIU Junnan, ZHU Guangxu. An Improved System for Short-term and Confusing Language Recognition. Journal of Chinese Information Processing. 2019, 33(10): 135-142

参考文献

[1] 崔瑞莲,宋彦,蒋兵,等.基于深度神经网络的语种识别[J].模式识别与人工智能,2015,28(12): 1093-1099.
[2] 姜洪臣,郑榕,张树武,等.基于SDC特征和GMM-UBM模型的自动语种识别[J].中文信息学报,2007, 21(1): 49-53.
[3] M Diez, A Varona, M Penagarikano,et al. Onthe use of log-likelihood ratios as featuresin spoken language recognition[C]//Proceedings of 2012 IEEE Spoken Language Technology Workshop,IEEE, 2012:274-279.
[4] M Diez, A Varona, M Penagarikano,et al.Dimensionality reduction of phone log-likelihood ratio features for spoken language recognition [C]//Proceedings of Interspeech 2013, Lyon, France, Aug. 2013:25-29.
[5] Alicia L D,Ruben Z, Toledano D T,et al.An analysis of the influence of deep neural network (DNN) topology in bottleneck feature based language recognition[J]. PLOS ONE, 2017,12(8): e0182580.
[6] N Dehak, P J Kenny, R Dehak,et al. Front-End Factor Analysis for SpeakerVerification[J].IEEE Transactions on Audio, Speech, and Language Processing, 2011,19(4): 788-798.
[7] 付强,宋彦,戴礼荣.因子分析在基于GMM的自动语种识别中的应用[J].中文信息学报,2009,23(04): 77-81.
[8] 蒋兵. 语种识别深度学习方法研究[D].合肥: 中国科学技术大学博士学位论文, 2015.
[9] 杨洋,汪毓铎.深度学习在语音识别声学建模中的应用[J].电脑知识与技术,2018,14(18): 190-192
[10] Matejka P, Schwarz P, Cernocky J, et al. Phonotactic language identification usinghigh quality phoneme recognition[C]//Proceedings of the Annual Conference of the International Speech CommunicationAssociation, 2005:2237-2240.
[11] Wold S,Esbensen K,Geladi P.Principle component analysis[J].Chemometric & Intelligent Laboratory System, 1987,2(1-3): 37-52.
[12] P A Torres-Carassquilo, E Singer, M A Kohler,et al.Approaches to language identification using Gaussian mixture modelsand shifted delta cepstral features[C]//Proceedings of ICSLP 2002, 2002:89-92.
[13] Fér, Radek, Matějka,et al. Multilin gually trained bottleneck features in spoken language recognition[J]. Computer Speech & Language,2017, 46:252-267.
[14] Tang Z, Wang D, Chen Y, et al. AP17-OLR Challenge: Data, Plan, and Baseline[C]//Proceedings of APSIPA ASC 2017:749-753.
[15] A Larcher, K A Lee, B Ma,et al, Phonetically constrained PLDA modeling for text-dependent speaker verification with multiple short utterances[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 2013:7673-7677.
[16] 王军. 复杂环境下说话人确认鲁棒性研究[D].北京: 清华大学博士学位论文,2015.
[17] 戚婷. 基于DBN-UBM-DBF系统TV建模下的语种识别方法研究[D].合肥: 中国科学技术大学硕士学位论文,2017.
[18] W M Campbell, E Singer, P A Torres-Carrasquillo,et al, Language recognition with support vector machines[C]//Proceedings of Odyssey, 2004:41-44.
[19] Chen T,Guestrin C.XGBoost: A Scalable Tree Boosting System[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016:785-794.
[20] Breiman L.Random forests[J].Machine Learning,2001,45(1): 5-32
[21] Burget L,Matejka P,Schwarz P.Analysis of feature extraction and channel compensation in a GMM speaker recognition system[J].IEEE Transactions on Audio,Speech and Language Processing,2007,15(7): 1979-1986.

基金

天津市科委“面向多语种的智能信息系统研究”(17ZXRGGX00160)
PDF(3748 KB)

772

Accesses

0

Citation

Detail

段落导航
相关文章

/