基于时域单元融合的拼接平滑算法

郭武,吴义坚

PDF(424 KB)
PDF(424 KB)
中文信息学报 ›› 2006, Vol. 20 ›› Issue (5) : 73-78.

基于时域单元融合的拼接平滑算法

  • 郭武,吴义坚
作者信息 +

A Smoothing Method for Voiced Units Concatenation Based on Time-Domain Unit Fusion

  • GUO Wu,WU Yi-jian
Author information +
History +

摘要

针对基于大语料库的拼接合成系统中经常出现的拼接单元不匹配问题,特别是浊音拼接处不匹配对合成效果会产生较大的损伤,本文提出一种基于时域单元融合技术的平滑算法。它通过模板匹配选取合适的过渡段模板作为融合单元,并同时进行相位对齐,然后采用TD-PSOLA的方法对拼接单元和融合单元进行时域上的基音同步迭加融合。它的优点是对音质损伤很小,而且直接在时域上进行,效率高。通过对平滑前后语谱及主观听感两个方面的对比评测,平滑后的效果比平滑前有明显改善。

Abstract

The corpus-based concatenative speech synthesis methods have became popular for its high-quality speech. However, the quality of concatenated speech often suffers from discontinuities between the acoustic units, due to contexual differences and variations in speaking styles across the database, especially between the voiced units. In this paper, we proposed a smoothing method called time-domain unit fusion (TD-UF) to smooth the discontinuities between the voiced units. In the proposed method, the appropriate fusion unit, i.e. transition template, was obtained by periodic matching in time-domain, and then the fusion procedure was performed between the concatenated unit and fusion unit in time domain by TD-PSOLA. From the result of comparison in spectral and perceptive aspect between the smoothed and un-smoothed data, the method has distinct smoothing effect on speech quality and high efficiency due to the operation in time domain.

关键词

计算机应用 / 中文信息处理 / 时域单元融合 / 拼接单元 / 融合单元

Key words

computer application / Chinese information processing / time-domain unit fusion / concatenated unit / fusion unit

引用本文

导出引用
郭武,吴义坚. 基于时域单元融合的拼接平滑算法. 中文信息学报. 2006, 20(5): 73-78
GUO Wu,WU Yi-jian. A Smoothing Method for Voiced Units Concatenation Based on Time-Domain Unit Fusion. Journal of Chinese Information Processing. 2006, 20(5): 73-78

参考文献

[1] 吴禀雅,周昌乐,吴洁敏. 汉语基调的调模与语音合成的质量提高[J]. 中文信息学报, 2003, 17 (3) : 53 - 58.
[2] A. J. Hunt and A. W. Black, Unit selection in a concatenative speech synthesis system using a large speech database[A]. Int. Conf. Acoustics, Speech, Signal Processing’96 [C] , 1996, 373 - 376.
[3] R. H. Wang ,Qingfeng Liu, Deyu Xia, : Towards A Chinese Text-To-Speech System With Higher Naturalness [A]. Proc. ICSLP98 [C] , 2047 - 2050, Sydney, 1998.
[4] R. H. Wang, Zhongke Ma, Wei Li, Donglai Zhu: A Corpus-Based Chinese Speech Synthesis with Contextual-Dependent Unit Selection [A]. Proc. Of ICSLP[C] , p391 - 394, Beijing, 2000.
[5] David T. Chappel and John H. L. Hanson, A comparison of Spectral Smoothing methods for segment concatenation based speech synthesis[J]. Speech Communication, vol. 36, no. 3 - 4, 43 - 374, March 2002.
[6] J. Wouters and M. W. Macon, Control of spectral dynamic in concatenative speech synthesis[J]. IEEE Transactions on Speech and Audio Processing, vol. 9, no. 1, 30 - 38, 2001.
[7] Y. Stylianou, Removing linear phase mismatches in concatenative speech synthesis[J]. IEEE Transactions on Speech and Audio Processing, vol. 9, no. 3, March 2001.
[8] Moulines E. and Charpentier F. , Pitch-Synchronous Waveform Processing Techniques for text-to-speech Synthesis Using Diphones[J]. Speech Communication, vol. 9, 453 - 467, 1990.
PDF(424 KB)

703

Accesses

0

Citation

Detail

段落导航
相关文章

/