基于时域单元融合的拼接平滑算法

PDF(424 KB)

中文信息学报 ›› 2006, Vol. 20 ›› Issue (5) : 73-78.

基于时域单元融合的拼接平滑算法

郭武,吴义坚

作者信息 +

A Smoothing Method for Voiced Units Concatenation Based on Time-Domain Unit Fusion

GUO Wu,WU Yi-jian

Author information +

History +

摘要

针对基于大语料库的拼接合成系统中经常出现的拼接单元不匹配问题,特别是浊音拼接处不匹配对合成效果会产生较大的损伤,本文提出一种基于时域单元融合技术的平滑算法。它通过模板匹配选取合适的过渡段模板作为融合单元,并同时进行相位对齐,然后采用TD-PSOLA的方法对拼接单元和融合单元进行时域上的基音同步迭加融合。它的优点是对音质损伤很小,而且直接在时域上进行,效率高。通过对平滑前后语谱及主观听感两个方面的对比评测,平滑后的效果比平滑前有明显改善。

Abstract

The corpus-based concatenative speech synthesis methods have became popular for its high-quality speech. However, the quality of concatenated speech often suffers from discontinuities between the acoustic units, due to contexual differences and variations in speaking styles across the database, especially between the voiced units. In this paper, we proposed a smoothing method called time-domain unit fusion (TD-UF) to smooth the discontinuities between the voiced units. In the proposed method, the appropriate fusion unit, i.e. transition template, was obtained by periodic matching in time-domain, and then the fusion procedure was performed between the concatenated unit and fusion unit in time domain by TD-PSOLA. From the result of comparison in spectral and perceptive aspect between the smoothed and un-smoothed data, the method has distinct smoothing effect on speech quality and high efficiency due to the operation in time domain.

导出引用

郭武,吴义坚. 基于时域单元融合的拼接平滑算法. 中文信息学报. 2006, 20(5): 73-78

GUO Wu,WU Yi-jian. A Smoothing Method for Voiced Units Concatenation Based on Time-Domain Unit Fusion. Journal of Chinese Information Processing. 2006, 20(5): 73-78

参考文献

[1] 吴禀雅,周昌乐,吴洁敏. 汉语基调的调模与语音合成的质量提高[J]. 中文信息学报, 2003, 17 (3) : 53 - 58.
[2] A. J. Hunt and A. W. Black, Unit selection in a concatenative speech synthesis system using a large speech database[A]. Int. Conf. Acoustics, Speech, Signal Processing’96 [C] , 1996, 373 - 376.
[3] R. H. Wang ,Qingfeng Liu, Deyu Xia, : Towards A Chinese Text-To-Speech System With Higher Naturalness [A]. Proc. ICSLP98 [C] , 2047 - 2050, Sydney, 1998.
[4] R. H. Wang, Zhongke Ma, Wei Li, Donglai Zhu: A Corpus-Based Chinese Speech Synthesis with Contextual-Dependent Unit Selection [A]. Proc. Of ICSLP[C] , p391 - 394, Beijing, 2000.
[5] David T. Chappel and John H. L. Hanson, A comparison of Spectral Smoothing methods for segment concatenation based speech synthesis[J]. Speech Communication, vol. 36, no. 3 - 4, 43 - 374, March 2002.
[6] J. Wouters and M. W. Macon, Control of spectral dynamic in concatenative speech synthesis[J]. IEEE Transactions on Speech and Audio Processing, vol. 9, no. 1, 30 - 38, 2001.
[7] Y. Stylianou, Removing linear phase mismatches in concatenative speech synthesis[J]. IEEE Transactions on Speech and Audio Processing, vol. 9, no. 3, March 2001.
[8] Moulines E. and Charpentier F. , Pitch-Synchronous Waveform Processing Techniques for text-to-speech Synthesis Using Diphones[J]. Speech Communication, vol. 9, 453 - 467, 1990.

PDF(424 KB)

703

Accesses

Citation

Detail

段落导航

摘要
Abstract
关键词
Key words
引用本文
参考文献

Received	Revised	Published
2005-09-20	2015-12-03	2006-10-16
Issue Date
2006-10-16

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注