针对爆发谱特征不稳定的问题,论文提出了一种基于能量变化率的汉语塞音检测方法。该方法首先基于Seneff听觉谱提取了一组描述音段能量变化率特性的参数,然后采用Fisherface方法进行特征变换,变换后的特征采用K近邻(KNN)分类器进行分类,实现了塞音的检测,最后利用留一法对模型性能进行交叉验证。实验结果表明,干净语音塞音检测准确率可以达到96.39%,信噪比10dB的语音塞音检测准确率可达到88.07%,模型具有较好的稳定性和泛化性能。
Abstract
In order to solve the issue of unreliable burst spectrum feature, a Chinese stop detection method based on energy change rate characteristic is proposed. The energy change rate features are first acquired from the Seneff's auditory spectrum, and then transformed by Fisherface approach. Finally the KNN classifier is implemented to realize stop detection. Tested by leave-one-out cross validation, the results indicate a good performance of high stability and generalization: the accuracy is 96.39% for clean speech and 88.07% for noisy speech with the SNR of 10dB.
关键词
塞音检测 /
能量变化率 /
发音特性 /
Seneff听觉模型
{{custom_keyword}} /
Key words
stop detection /
energy change rate /
articulatory characteristic /
Seneff auditory model
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Chin-Hui.Lee, From knowledge-ignorant to knowledge-rich modeling: A new speech research paradigm for next generation automatic speech recognition[C]//Proceedings of ICSLP Keynote Speech, 2004:1137-1140.
[2] Jurgen T Geiger, Mohamed Anouar Lakhal, Bjorn Schuller, Gerhard Rigoll. Learning new acoustic events in an HMM-based system using MAP adaptation[C]//Proceedings of INTERSPEECH, 2011:293-296.
[3] David Mejía-Navarrete, Ascensin Gallardo-Antolín, Carmen Pelez-Moreno. Feature Extraction Assessment for an Acoustic-Event ClassificationTask Using the Entropy Triangle[C]//Proceedings of INTERSPEECH, 2011:309-312.
[4] 张宝奇,张连海,屈丹. 基于听觉事件检测的汉语语音声韵切分[J].声学学报,2010,35(6): 701-707.
[5] Almpanidis G, Kotti M, Kotropoulos, and C., Robust Detection of Phone Boundaries Using Model Selection Criteria With Few Observations[J], IEEE Transactions on Audio, Speech, and Language Processing, 2009,17(2):287-298.
[6] 陈斌,张连海,王波,屈丹.基于Seneff听觉谱特征的汉语连续语音声韵母边界检测[J].声学学报,2012,37(1):104-112.
[7] M F Dorman. Relative spectral change and formant transitions as cues to labial and alveolar place of articulation[J]. J.Acoust. Soc. Am. 1996,100(6):3825-3830.
[8] A R Jayan and P C Pandey, Detection of stop landmarks using gaussian mixture model of speech spectrum[C]//Proceedings of ICASSP, 2009:4681 4684.
[9] Chi-Yueh Lin, Hsiao-Chuan Wang. Using Burst Onset Information To Improve Stop/Affricate Phone Recognition[C]//Proceedings of ICASSP[C], 2010:4862-4865.
[10] Prem C Pandey, Milind S Shah, Estimation of Place of Articulation During Stop Closures of Vowel Consonant Vowel Utterances, IEEE Transactions on Audio, Speech, and Language Processing, 2009,17(2):277-286.
[11] Chi-Yueh Lin, Hsiao-Chuan Wang. Mandarin Stops Classification Based On Random Forest Approach[C]//Proceedings of ISCSLP 2008:1-4.
[12] Stephanie Seneff, A joint synchrony/mean-rate model of auditory speech processing[J], Journal of Phone-tics, 1988,16: 55-76.
[13] Stephanie Seneff, Pitch and Spectral Analysis of Speech Based on an Auditory Synchrony Model[M], Cambridge, Massachusetts Institute of Technology,1985.
[14] Ahmed M. Abdelatty Ali, Jan Van der Spiegel, Paul Mueller, Robust Auditory-Based Speech Processing Using the Average Localized Synchrony Detection[J], IEEE Transaction on Signal and Audio Processing, 2001, 10:279-292.
[15] Ahmed M. Abdelatty Ali, Jan Van der Spiegel, Paul MuellerAcoustic Phonetic Features for the Automatic Classification of Stop Consonants, IEEE Transactions on Audio, Speech, and Language Processing, 2001,9(8):833-841.
[16] Yang J,Yang J Y. Why can LDA be performed in PCA transformed space[J]. Pattern Recognition,2003,36(2):563-566.
[17] Steve Young.The HTK Book(for HTK Version 3.4).Cambridge University Engineering Department,2006:289.
[18] Richard O. Duda,Peter E. Hart David G. Stork著,李宏东,姚天翔等译.模式分类[M].北京: 机械工业出版社,2009.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61175017)
{{custom_fund}}