黄晓辉,李京. 基于循环神经网络的藏语语音识别声学模型[J]. 中文信息学报, 2018, 32(5): 49-55.
HUANG Xiaohui,LI Jing. The Acoustic Model for Tibetan Speech Recognition Based on Recurrent Neural Network. , 2018, 32(5): 49-55.
The Acoustic Model for Tibetan Speech Recognition Based on Recurrent Neural Network
HUANG Xiaohui1,2,LI Jing1
1.College of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, China; 2.PLA University of Foreign Language, Luoyang, Henan 471003, China
Abstract:The recurrent neural network and the connectionist temporal classification algorithm are applied to the acoustic modeling of Tibetan speech recognition, so as to achieve end-to-end model training. According to the relationship between the input and output of the acoustic model, the time domain convolution operation on the output sequence of the hidden layer is introduced to reduce the time domain expansion of the network’s hidden layers. Experimental results show that the recurrent neural network model achieves better recognition performance in Tibetan Lhasa phoneme recognition compared with the traditional acoustic models based on Hidden Markov Model, while the acoustic model based on recurrent neural network with time-domain convolution possesses higher training and decoding efficiency while maintaining the same recognition performance.
[1] 于洪志,高璐,李永宏,等.藏语机读音标SAMPA_ST的设计[J].中文信息学报,2012,26(4): 65-72. [2] 陈小莹,艾金勇,于洪志.藏语拉萨话单音节噪音声学参数分析[J].中文信息学报,2015,29(3): 184-189. [3] 德庆卓玛.藏语语音识别研究综述[J].西藏大学学报, 2010, 25(S1): 192-195. [4] 姚徐,李永宏,单广荣.藏语孤立词语音识别系统研究[J].西北民族大学学报(自然科学版),2009,30(1): 29-36,50. [5] 杨阳蕊,李永宏,于宏志.基于半音节的藏语连续语音语料库设计[C]. 第十届全国人机语音通讯学术会议暨国际语音语言处理研讨会. 乌鲁木齐: 新疆师范大学出版社,2009: 380-383. [6] 李冠宇,孟猛.藏语拉萨话大词表连续语音识别声学模型研究[J].计算机工程,2012,38(5): 189-191. [7] 戴礼荣,张仕良.深度语音信号与信息处理: 研究进展与展望[J].数据采集与处理,2014,29(2): 171-178. [8] 王辉,赵悦,刘晓凤.基于深度特征学习的藏语语音识别[J].东北师大学报(自然科学版),2015,47(4): 69-73. [9] 袁胜龙,郭武,戴礼荣.基于深层神经网络的藏语识别[J].模式识别与人工智能,2015,28(3): 209-213. [10] Graves A, Mohamed A,Hinton G. Speech recognition with deep recurrent neural networks[C]//Proceedings of ICASSP, 2013: 6645-6649. [11] Alex Graves,Navdeep Jaitly. Towards end-to-end speech recognition with recurrent neural networks[C]//Proceedings of the 31st International Conference on Machine Learning(ICML-14),2014: 1764-1772. [12] Song W, Cai J. End-to-end deep neural network for automatic speech recognition[R]. Technical Report CS224D, University of Stanford, 2015. [13] Hochreiter Sepp, Schmidhuber Jurgen. Long short-term memory[J]. Neural Computation.1997, 9(8): 1735-1780. [14] Yoshua Bengio, Patrice Simard, Paolo Frasconi. Learning long-term dependencies with gradient descent is difficult[J]. IEEE Transactions on Neural Networks.1994,5(2): 157-166. [15] Sak H,Vinyals O,Heigold G. Sequence discriminative distributed training of long short-term memory recurrent neural networks[C]//Proceedings of the Interspeech.2014: 1209-1213. [16] Alex Graves, Santiago Ferna′ndez, Jurgen Schmidhuber. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the 23rd International Conference on Machine Learning,ACM,2006: 369-376.