加入调型信息的汉语孤立词识别研究

王 鹏, 胡 郁, 戴礼荣, 刘庆峰

PDF(752 KB)
PDF(752 KB)
中文信息学报 ›› 2010, Vol. 24 ›› Issue (4) : 85-91.
综述

加入调型信息的汉语孤立词识别研究

  • 王 鹏, 胡 郁, 戴礼荣, 刘庆峰
作者信息 +

Study on the Identification of the Isolated Word in Mandarin Speech
Recognition with Tone Information

  • WANG Peng,HU Yu,DAI Lirong, LIU Qingfeng
Author information +
History +

摘要

汉语是一种有调语言,因此在汉语语音识别中,调型信息起着非常关键的作用。在现有的隐马尔可夫模型(Hidden Markov Model)框架下,如何有效地利用调型信息是有待研究的问题。现有的汉语语音识别系统中主要采用两种方式来使用调型信息 一种是基于Embedded Tone Model,即将调型特征向量与声学特征向量组成一个流去训练模型;一种是Explicit Tone Model,即将调型信息单独建模,再利用此模型优化原有的解码网络。该文将两种方法统一起来,首先利用Embedded Tone Model采用双流而非单流建模得到Nbest备选,再利用Explicit Tone Model对调进行左相关建模并对Nbest得分重新修正以得到识别结果,从而获得性能提升。与传统的无调模型相比,该文方法的识别率的平均绝对提升超过了3.0%,在第三测试集上的绝对提升达到了5.36%。

Abstract

Mandarin is a kind of tonal language and the tone information plays a key role in Mandarin speech recognition. Within the framework of HMM (Hidden Markov Model), how to use tone information effectively is an important and open research issue. In the state-of-art Mandarin speech recognition system, there are two ways to apply tone informationthe one is Embedded Tone Model (in which the tone related features are appended to spectral features to form an augmented acoustic feature vectors to train HMM model), the other is Explicit Tone Model ( in which the one modeling is separated from syllable modeling and tone model is applied to optimize existed decoding network). This paper presents a way to combine these two methods to identify the isolated word in Mandarin speech recognition. Firstly, we get the Nbest items with Embedded Tone Model based on two-stream model rather than conventional single-stream model. Then the Explicit Tone Model based left dependent tonal model is established to re-score the Nbest items. The method proposed achieves over 5.0% absolute improvement in average in all test sets and up to 5.36% absolute improvement in NoiseCar test set compared with traditional model without tone information.
Key wordscomputer application; Chinese information processing; computer application; Chinese information processing; Mandarin speech recognition ; tone information; tone model; two-stream model

关键词

计算机应用 / 中文信息处理 / 计算机应用 / 汉语信息处理 / 汉语语音识别 / 调型信息 / 调型建模 / 双流建模

Key words

computer application / Chinese information processing / computer application / Chinese information processing / Mandarin speech recognition / tone information / tone model / two-stream model
 
/   /   /
 
/   /   /
 
/   /  

引用本文

导出引用
王 鹏, 胡 郁, 戴礼荣, 刘庆峰. 加入调型信息的汉语孤立词识别研究. 中文信息学报. 2010, 24(4): 85-91
WANG Peng,HU Yu,DAI Lirong, LIU Qingfeng. Study on the Identification of the Isolated Word in Mandarin Speech
Recognition with Tone Information
. Journal of Chinese Information Processing. 2010, 24(4): 85-91

参考文献

[1] Y. W. Wong and E. Chang. The effect of pitch and tone on different Mandarin speech recognition tasks[C]// Proc. Eurospeech, 2001: 1517-1521.
[2] C. J. Chen, R. A. Gopinath, M. D. Monkowski, M. A. Picheny, and K. Shen. New methods in continuous Mandarin speech recognition[C]// Proc. Eurospeech, 1997: 1543-1546.
[3] Modeling of fundamental frequency using a quadratic spline function[C]// ’IYavaux de I’Institut-de Phonetique d’Aix 15, 1993: 71-85.
[4] Qian Y. Use of Tone information in cantonese LVCSR based on generalized character posterior probability decoding[D]. PhD. Thesis, CUHK, 2005.
[5] Tokuda K, Masuko T, Miyazaki N, Kobayashi T. Multispace probability distribution HMM[C]// IEICE Trans. Inf.& Syst., 2002; E85-D(3): 455-464.
[6] Frank Seide and N. Wang, Two-Stream Modeling of Mandarin Tones[C]// Proc.ICSLP 2000, October, 2000.
[7] Wang H L, Qian Y, Soong F K, Zhou J L, Han J Q. A Multi-Space Distribution (MSD) approach to speech recognition of tonal languages[C]// Proc. of ICSLP,2006:1047-1050.
[8] Jin-song Zhang and Keikichi Hirose, Anchoring Hypothesis and its Application to Tone Recognition of Chinese Continuous Speech[C]// Proc. ICASSP 2000,2000.
[9] C.H. Huang and F. Seide. Pitch tracking and tone features for mandarin speech recognition[C]// Proceedings of ICASSP, 2000: 1523-1526.
[10] 朱小燕,王昱,刘俊, 汉语声调识别中的基音平滑新方法[J]. 中文信息学报,2001,20(2): 45-50.
[11] 潘逸倩,魏思,王仁华,基于韵律信息的连续语流调型评测研究[J].中文信息学报,2008,20(4): 88-93.
[12] 林茂灿.普通话语句的韵律结构和基频(F0)高低线构建[J].当代语言学,2002,(4): 254-265.
[13] 勇强,初敏,贺琳,吕士海.汉语话音节时长统计分析[C]//第五届全国现代语音学学术会议论文集,2001: 66-69.
     ()     ()
PDF(752 KB)

482

Accesses

0

Citation

Detail

段落导航
相关文章

/