基于声调核参数及DNN建模的韵律边界检测研究

林 举;解焱陆;张劲松;张 微

PDF(1855 KB)
PDF(1855 KB)
中文信息学报 ›› 2016, Vol. 30 ›› Issue (6) : 35-39.
综述

基于声调核参数及DNN建模的韵律边界检测研究

  • 林 举;解焱陆;张劲松;张 微
作者信息 +

Automatic Mandarin Prosody Boundary Detection Based on Tone Nucleus and DNN Model

  • LIN Ju; XIE Yanlu; ZHANG Jinsong; ZHANG Wei
Author information +
History +

摘要

韵律边界对言语表达的自然度和可理解度有着重要作用。韵律建模也是语音合成、语音理解中的重要方面。该文从相邻声调的相互作用角度出发,提出基于深度神经网络(DNN)及声调核声学特征的汉语韵律边界检测方法。该方法首先采用声调核部分的声学特征来计算边界检测相关参数。然后,利用深度神经网络进行建模。作为对比,实验中采用了以整个音节的声学特征为输入特征的基线系统。结果表明,只使用调核部分声学特征的系统优于使用整个音节的系统,韵律边界检测正确率相对提高了4%,这表明该文提出的汉语韵律边界检测方法的有效性。

Abstract

Prosody boundary plays an important role in naturalness and intelligibility of verbal expressions. Thus, prosody modeling is also an important aspect of speech synthesis and understanding. Focused on the interaction of adjacent tones, we propose a method of prosody boundary detection based on tone nucleus and DNN model. This method calculates the boundary-related parameters by applying the tone nucleus features. Then, the parameters are modeled by the deep neural network. For comparison, the baseline system chooses syllable the acoustic feature. The experimental results show a relative 4% improvement achieved by the proposed method.

关键词

韵律边界建模 / 声调核 / 深度神经网络

Key words

prosody boundary modeling / tone nucleus / deep neural network
 
/   /   /
 
/   /   /
 
/   /  

引用本文

导出引用
林 举;解焱陆;张劲松;张 微. 基于声调核参数及DNN建模的韵律边界检测研究. 中文信息学报. 2016, 30(6): 35-39
LIN Ju; XIE Yanlu; ZHANG Jinsong; ZHANG Wei. Automatic Mandarin Prosody Boundary Detection Based on Tone Nucleus and DNN Model. Journal of Chinese Information Processing. 2016, 30(6): 35-39

参考文献

[1] C W Wightman, M Ostendorf. Automatic labeling of prosodic patterns [J]. Speech and Audio Processing, 1994,2(4): 469-481.
[2] M Hasegawa-Johnson, K Chen, J Cole,et al, Simultaneous recognition of words and prosody in the boston university radio speech corpus [J]. Speech Communication, 2005, 46(3): 418-439.
[3] Q Chen, Z H Ling, C Y Yang, et al, Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and N-Gram Prior Distributions [C]//Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association, 2015: 1581-1585.
[4] W X Hu, T Y Huang, B Xu. Study on prosodic boundary location in Chinese mandarin [C]//IEEE International Conference on Acoustics, 2002: 501-504.
[5] 倪崇嘉,张爱英,刘文举,等. 基于韵律间断层级的汉语韵律间断分类[J]. 计算机应用研究,2011,28(7): 2452-2454。
[6] 杨辰雨,朱立新,凌震华,等. 基于Viterbi解码的中文合成音库韵律短语边界自动标注[J]. 清华大学学报(自然科学版),2011, 51(9): 1276-1281。
[7] J S Zhang, H Kawanami,Modeling carryover and anticipation effects for Chinese tone recognition [C]//Proceedings of the European Conference on Speech Communication and Technology, Eurospeech, 1999.
[8] J S Zhang, K Hirose. Tone nucleus modeling for Chinese lexical tone recognition [J]. Speech Communication, 2004, 42(3): 447-466.
[9] 熊子瑜,林茂灿. 语流间断出的韵律表现[C]//第六届全国人机语音通讯会议论文集, 2006.
[10] Y Xu, Q E. Wang, Pitch targets and their realization: Evidence from Mandarin Chinese [J]. Speech communication, 2001, 33(4): 319-337.
[11] L Rabiner, B H Juang. Fundamentals of speech recognition [M]. Tsinghua University Press, 1999.
[12] R O Duda, P E Hart, David G. Stork, Pattern classification [M]. Wiley, 2000.
[13] X X Chen, A J Li, S G Hua. An application of SAMPA-C for standard Chinese [C]//Proceedings of the Sixth International Conference on Spoken Language Processing, 2000.
[14] A J Li, Chinese prosody and prosodic labeling of spontaneous speech [C]//Proceedings of the Speech Prosody, 2002.
[15] G E Hinton, N Srivastava, A Krizhevsky, et al. Improving neural networks by preventing co- adaptation of feature detectors, arXiv preprint arXiv: 1207. 0580, 2012.

基金

北京语言大学梧桐创新平台项目资助(中央高校基本科研业务费专项基金)(16PT05);北京语言大学研究生创新基金资助项目(中央高校基本科研业务费专项资金)(16YCX163)
PDF(1855 KB)

Accesses

Citation

Detail

段落导航
相关文章

/