Abstract:Prosody boundary plays an important role in naturalness and intelligibility of verbal expressions. Thus, prosody modeling is also an important aspect of speech synthesis and understanding. Focused on the interaction of adjacent tones, we propose a method of prosody boundary detection based on tone nucleus and DNN model. This method calculates the boundary-related parameters by applying the tone nucleus features. Then, the parameters are modeled by the deep neural network. For comparison, the baseline system chooses syllable the acoustic feature. The experimental results show a relative 4% improvement achieved by the proposed method.
[1] C W Wightman, M Ostendorf. Automatic labeling of prosodic patterns [J]. Speech and Audio Processing, 1994,2(4): 469-481.
[2] M Hasegawa-Johnson, K Chen, J Cole,et al, Simultaneous recognition of words and prosody in the boston university radio speech corpus [J]. Speech Communication, 2005, 46(3): 418-439.
[3] Q Chen, Z H Ling, C Y Yang, et al, Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and N-Gram Prior Distributions [C]//Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association, 2015: 1581-1585.
[4] W X Hu, T Y Huang, B Xu. Study on prosodic boundary location in Chinese mandarin [C]//IEEE International Conference on Acoustics, 2002: 501-504.
[5] 倪崇嘉,张爱英,刘文举,等. 基于韵律间断层级的汉语韵律间断分类[J]. 计算机应用研究,2011,28(7): 2452-2454。
[6] 杨辰雨,朱立新,凌震华,等. 基于Viterbi解码的中文合成音库韵律短语边界自动标注[J]. 清华大学学报(自然科学版),2011, 51(9): 1276-1281。
[7] J S Zhang, H Kawanami,Modeling carryover and anticipation effects for Chinese tone recognition [C]//Proceedings of the European Conference on Speech Communication and Technology, Eurospeech, 1999.
[8] J S Zhang, K Hirose. Tone nucleus modeling for Chinese lexical tone recognition [J]. Speech Communication, 2004, 42(3): 447-466.
[9] 熊子瑜,林茂灿. 语流间断出的韵律表现[C]//第六届全国人机语音通讯会议论文集, 2006.
[10] Y Xu, Q E. Wang, Pitch targets and their realization: Evidence from Mandarin Chinese [J]. Speech communication, 2001, 33(4): 319-337.
[11] L Rabiner, B H Juang. Fundamentals of speech recognition [M]. Tsinghua University Press, 1999.
[12] R O Duda, P E Hart, David G. Stork, Pattern classification [M]. Wiley, 2000.
[13] X X Chen, A J Li, S G Hua. An application of SAMPA-C for standard Chinese [C]//Proceedings of the Sixth International Conference on Spoken Language Processing, 2000.
[14] A J Li, Chinese prosody and prosodic labeling of spontaneous speech [C]//Proceedings of the Speech Prosody, 2002.
[15] G E Hinton, N Srivastava, A Krizhevsky, et al. Improving neural networks by preventing co- adaptation of feature detectors, arXiv preprint arXiv: 1207. 0580, 2012.