基于I-Vector特征融合的蒙古语说话人特征提取方法

韩佳俊,马志强,王洪彬,谢秀兰

PDF(2744 KB)
PDF(2744 KB)
中文信息学报 ›› 2023, Vol. 37 ›› Issue (1) : 71-78.
民族、跨境及周边语言信息处理

基于I-Vector特征融合的蒙古语说话人特征提取方法

  • 韩佳俊1,马志强1,2,王洪彬1,谢秀兰1
作者信息 +

A Speaker Feature Extraction Method Based on I-vector Resource Fusion

  • HAN Jiajun1,MA Zhiqiang1,2,WANG Hongbin1,XIE Xiulan1
Author information +
History +

摘要

针对蒙古语语料少导致蒙古语说话人自适应语音识别系统效果差的问题,该文提出一种基于I-vector特征融合的说话人特征提取方法。首先在低资源语料和高资源语料上分别训练I-vector模型,之后利用两者训练得到的I-vector特征作为中间数据进行最后的特征融合训练。在蒙古语和TIMIT语料库上的实验结果表明,融合训练后I-vector说话人特征表现较优,与融合前的I-vector特征相比,平均WER降低了0.7%,平均SER降低了3.1%。

Abstract

Focused on the adaptive Mongolian speech recognition, this paper proposes a speaker feature extraction method based on I-vector resource fusion. First, I-vector models are trained on low-resource corpus and high-resource corpus. Then I-vector features obtained from the two corpus are used as intermediate data for final feature fusion training. Experiments on Mongolian and TIMIT corpora show that proposed method reduced the error by 0.7% according to WER and 3.1% according to SER.

关键词

I-Vector / 说话人自适应 / 特征提取 / 蒙古语 / 低资源

Key words

I-vector / speaker adaptation / feature extraction / Mongolian / low resource

引用本文

导出引用
韩佳俊,马志强,王洪彬,谢秀兰. 基于I-Vector特征融合的蒙古语说话人特征提取方法. 中文信息学报. 2023, 37(1): 71-78
HAN Jiajun,MA Zhiqiang,WANG Hongbin,XIE Xiulan. A Speaker Feature Extraction Method Based on I-vector Resource Fusion. Journal of Chinese Information Processing. 2023, 37(1): 71-78

参考文献

[1] TAN T,QIAN Y,YU D,et al. Speaker-aware training of LSTM-RNNs for acoustic modelling[C]//Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing. IEEE,2016: 5280-5284.
[2] GUPTA V,KENNY P,OUELLET P,et al. I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription[C]//Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing. IEEE,2014: 6334-6338.
[3] HUANG Z,TANG J,XUE S,et al. Speaker adaptation of RNN-BLSTM for speech recognition based on speaker code[C]//Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing. IEEE,2016: 5305-5309.
[4] ABDELHAMID O,JIANG H. Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code[C]//Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing. IEEE,2013: 7942-7946.
[5] YU D,YAO K,SU H,et al. KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition[C]//Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing. IEEE,2013: 7893-7897.
[6] TTH L,GOSZTOLYA G. Adaptation of DNN acoustic models using KL-divergence regularization and multi-task training[C]//Proceedings of the International Conference on Speech and Computer. Springer,Cham,2016: 108-115.
[7] KIM M,KIM Y,YOO J,et al. Regularized speaker adaptation of KL-HMM for dysarthric speech recognition[J]. IEEE Transactions on Neural Systems and Rehabilitation Engineering,2017,25(9): 1581-1591.
[8] PRICE R,ISO K,SHINODA K. Speaker adaptation of deep neural networks using a hierarchy of output layers[C]//Proceedings of the IEEE Spoken Language Technology Workshop. IEEE,2014: 153-158.
[9] HUANG Z,LI J,SINISCALCHI S M,et al. Rapid adaptation for deep neural networks through multi-task learning[C]//Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015: 3625-3629.
[10] SAMARAKOON L,SIM K C. Low-rank bases for factorized hidden layer adaptation of DNN acoustic models[C]//Proceedings of the IEEE Spoken Language Technology Workshop. IEEE,2016: 652-658.
[11] SAMARAKOON L,SIM K C. Multi-attribute factorized hidden layer adaptation for dnn acoustic models[C]//Proceedings of the Interspeech, 2016: 3484-3488.
[12] SWIETOJANSKI P,LI J,RENALS S. Learning hidden unit contributions for unsupervised acoustic model adaptation[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing,2016,24(8): 1450-1463.
[13] ZHAO Y,LI J,GONG Y. Low-rank plus diagonal adaptation for deep neural networks[C]//Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing. IEEE,2016: 5005-5009.
[14] KUMAR K,LIU C,GONG Y. Non-negative intermediate-layer DNN adaptation for a 10-kb speaker adaptation profile[C]//Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing. IEEE,2016: 5285-5289.
[15] KENNY P,DUMOUCHEL P. Experiments in speaker verification using factor analysis likelihood ratios[C]//Proceedings of the Speaker and Language Recognition Workshop, 2004.
[16] DEHAK N,KENNY P J,DEHAK R,et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio,Speech,and Language Processing,2010,19(4): 788-798.
[17] SAON G,SOLTAU H,NAHAMOO D,et al. Speaker adaptation of neural network acoustic models using i-vectors[C]//Proceedings of the Automatic Speech Recognition and Understanding. IEEE,2014: 55-59.
[18] YANG J,ZHANG W,LIU J. Investigation of normalization methods in speaker adaptation of deep neural network using I-vector[J]. 中国科学院大学学报,2017, 34(05): 633-639.
[19] POVEY D,GHOSHAL A,BOULIANNE G,et al. The Kaldi speech recognition toolkit[C]//Proceedings of the WorkShop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society,2011: 1-4.
[20] HUNT M J. Figures of merit for assessing connected-word recognisers[J]. Speech Communication,1990,9(4): 329-336.

基金

国家自然科学基金(61762070,61862048);内蒙古自治区自然科学基金(2019MS06004);内蒙古自治区科技重大专项(2019ZD015);内蒙古自治区关键技术攻关计划项目(2019GG273)
PDF(2744 KB)
HTTP404 无法找到页面

http404错误

没有找到您要访问的页面,请检查您是否输入正确url。

请尝试以下操作:

·如果您已经在地址栏中输入该网页的地址,请确认其拼写正确。
·打开主页,然后查找指向您感兴趣信息的链接。
·单击后退链接,尝试其他链接。

HTTP404 无法找到页面

http404错误

没有找到您要访问的页面,请检查您是否输入正确url。

请尝试以下操作:

·如果您已经在地址栏中输入该网页的地址,请确认其拼写正确。
·打开主页,然后查找指向您感兴趣信息的链接。
·单击后退链接,尝试其他链接。

Accesses

Citation

Detail

段落导航
相关文章

/