语音合成是人机交互的核心技术之一,也是中文信息处理领域的一项前沿技术。随着神经网络理论的不断深入,基于神经网络的语音合成技术越来越引起人们的关注。该文通过分析藏文字结构与藏语拼读规则,融合Sequence to Sequence模型和注意力机制,研究了基于神经网络的藏语语音合成技术。实验数据表明,该文方法在藏语语音合成上具有良好的性能表现。
Abstract
Speech synthesis is one of the core technologies of human-computer interaction. With the development of neural network, the speech synthesis technology based on neural network has attracted more and more attention. After analyzing the structure and spelling rules of Tibetan characters, this paper studies Tibetan speech synthesis by combining Sequence to Sequence model and attention mechanism. The experimental results show that this method has good performance in the speech synthesis of Tibetan.
关键词
藏语语音合成 /
神经网络 /
Sequence to Sequence模型 /
注意力机制
{{custom_keyword}} /
Key words
Tibetan speech synthesis /
neural network /
Sequence to Sequence /
attention
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 张斌,全昌勤,任福继.语音合成方法和发展综述[J].小型微型计算机系统,2016,37(01): 186-192.
[2] Hunt A J,Black A W.Unit selection in a concatenative speech synthesis system using a large speech database[C]//Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing,1996,1(1): 373-376.
[3] Zen H,Tokuda K,Black A W.Statistical parametric speech synthesis[J]. Speech Communication,2009,51(11):1039-1064.
[4] Qian Y,et al.On the training aspects of deep neural network (DNN) for parametric TTS synthesis[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2014:3829-3833.
[5] P Wang,et al.Word embedding for recurrent neural network based TTS synthesis[C]//Proceedings of IEEE International Conference on Acoustics,2015:4879- 4883.
[6] Lipton Z C,Berkowitz J,Elkan C. A critical review of recurrent neural networks for sequence learning[J]. arXiv preprint arXiv: 1506.00019,2015.
[7] Zen H,Sak H.Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP),2015:4470-4474.
[8] Ming H,et al.A light-weight method of building an LSTM-RNN-based bilingual tts system[C]//Procee-dings of International Conference on Asian Language Processing,2017:201-205.
[9] Reddy V R,Rao K S.Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks[J].Neurocomputing,2016,171: 1323-1334.
[10] Rajendran V,Kumar G B.Prosody prediction for tamil text-to-speech synthesizer using sentiment analysis[J]. Asian Journal of Pharmaceutical & Clinical Research, 2017, 10(13):6.
[11] Delic′ T,et al.Rapid development of new TTS voices by neural network adaptation[C]//Proceedings of International Symposium Infoteh-Jahorina,2018: 1-6.
[12] Wang Y,et al.Tacotron:Towards end-to-end speech synthesis[J]. arXiv preprint arXiv: 1703.10135,2017.
[13] 才让卓玛,才智杰.基于语料库的藏语语音合成单元选择算法[J].中文信息学报,2017,31(5):59-63.
[14] 周雁,赵栋材.基于HMM模型的藏语语音合成研究[J].计算机应用与软件,2015,32(5):171-174.
[15] 高璐,于洪志,郑文思.基于HMM的藏语拉萨话语音合成技术研究[J].西北民族大学学报(自然科学版),2011,32(2):30-35.
[16] Weiss R J,Chorowski J,Jaitly N,et al. Sequence-to-sequence models can directly translate foreign speech[J]. arXiv preprint arXiv: 1703.08581,2017.
[17] Cho K, Merrienboer B V, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv preprint arXiv: 1406.1078,2014.
[18] Iiya Sutskever,Oriol Vinyals,Quoc V Le. Sequence to sequence learning with neural networks[J].Computation and Language,2014: 3104-3112.
[19] 羊忠旦增.藏语三大方言比较研究[D].北京: 中央民族大学硕士学位论文,2013.
[20] 才让卓玛,李永明,才智杰.藏语语音合成单元选择[J]. 软件学报,2015,26(6):1409-1420.
[21] 江荻,龙从军.藏文字符研究[M].北京: 社会科学院文献出版社,2010.
[22] Griffin D,Lim J S.Signal estimation from modified short-time Fourier transform[J]. IEEE Transactions on Acoustics Speech & Signal Processing,1984,32(2): 236-243.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61866032,61163018,61262051,61662061);国家社会科学基金(13BYY141,16BYY167,15BYY167);教育部“春晖计划”合作科研项目(Z2012093,Z2016077);青海省科技厅项目(2017-ZJ-767,2019-SF-129);“长江学者和创新团队发展计划”创新团队资助项目(IRT1068);青海省重点实验室项目(2013-Z-Y17,2014-Z-Y32,2015-Z-Y03);藏文信息处理与机器翻译重点实验室(2013-Y-17);青海师范大学2017年度创新训练项目
{{custom_fund}}