普通话发音评估性能改进

齐 欣1, 肖云鹏1, 2, 叶卫平1

PDF(1837 KB)
PDF(1837 KB)
中文信息学报 ›› 2013, Vol. 27 ›› Issue (3) : 48-56.
综述

普通话发音评估性能改进

  • 齐 欣1, 肖云鹏1, 2, 叶卫平1
作者信息 +

Improvements on Mandarin Pronunciation Evaluation

  • QI Xin1, XIAO Yunpeng1, 2, YE Weiping1
Author information +
History +

摘要

为减少噪声环境对评估性能的影响,该文将PNCC参数引入普通话发音评估。结果表明,其评分相关性在普通话测试实录音数据库上较传统MFCC参数提高了6.6%。在此基础上,对汉语声学模型拆分方法进行了研究,提出将声母介音+韵母模型拆分方法应用到发音评估中。使用这种拆分方式的评估系统总错误率降低5.6%,专家打分相关性则提高了0.056。该文还对模型最佳状态数的选取进行讨论,并提出模型状态数混合和不同配置综合评分两种混合评分方案,在相关性上较同等条件下3状态模型分别提高了0.021和0.017。

Abstract

In this paper, PNCC(Power-Normalized Cepstral Coefficients) is introduced into Mandarin pronunciation evaluation system for reducing the impact of background noise. The result shows that the score correlation based on PNCC has been increased by 6.6% compared with classical MFCC. Then, different initial-final acoustic model structures for Chinese syllables are investigated on Mandarin pronunciation evaluation. An initial-medial and final (IMF) modeling is applied, resulting 5.6% reduction of the error rate and an increase of 0.056 score correlation. Finally, the number of states in HMM model is discussed for pronunciation scoring, and some mixed score computing schemes based on either models or scores are proposed. Test results show the score correlation with the experts has been increased by 0.021 and 0.017 respectively.
Key wordsmandarin pronunciation evaluation; PNCC; initial-medial and final; HMM states

关键词

发音评估 / PNCC / 模型拆分 / HMM状态数

Key words

mandarin pronunciation evaluation / PNCC / initial-medial and final / HMM states
 
/   /   /
 
/   /   /
 
/   /  

引用本文

导出引用
齐 欣1, 肖云鹏1, 2, 叶卫平1. 普通话发音评估性能改进. 中文信息学报. 2013, 27(3): 48-56
QI Xin1, XIAO Yunpeng1, 2, YE Weiping1. Improvements on Mandarin Pronunciation Evaluation. Journal of Chinese Information Processing. 2013, 27(3): 48-56

参考文献

[1] S M Witt. Use of Speech Recognition in Computer assisted Language Learning[D]. PhD Thesis, the University of Cambridge, Nov.1999.
[2] H Strik, K Truong, et al. Comparing different approaches for automatic pronunciation error detection[J]. Speech Communication, 2009, 51(10): 845-852.
[3] K Truong, A Neri, C Cucchiarini, et al. Automatic pronunciation error detection: an acoustic-phonetic approach[C]//Proceedings of the InSTIL/ICALL Symposium 2004. Venice, Italy: 2004: 135-138.
[4] A Neri, C Cucchiarini, W Strik. Automatic speech recognition for second language learning how and why it actually works[C]//Proceedings of the 15th International Congresses of Phonetic Sciences. Barcelona, Spanish: 2003: 1157-1160.
[5] 刘庆升,魏思,胡郁,等. 基于语言学知识的发音质量评价算法改进[J].中文信息学报,2007,21(4):92-96.
[6] 葛凤培,潘复平,董滨,等. 汉语发音质量评估的实验研究术[J].声学学报,2010, 35(2):261-266.
[7] 张峰,黄超,戴礼荣. 普通话发音错误自动检测技术[J].中文信息学报,2010, 24(2):110-115.
[8] Kim Chanwoo. Robust Speech Recognition Motivated by Auditory and Binaural Observations[D]. Department of Language Technologies Institute, Carnegie Mellon University, Ph D. thesis, July 2009.
[9] 郑静. 针对普通话水平测试的汉语自动发音评估[D]. 北京师范大学, 2008, 6: 30-31.
[10] H L Wang, J Q Han, T R Zheng. Quality evaluation and optimization of confusion network for LVCSR[J].中国科学院电子学报合集1994-2007.
[11] L Mangu, E Brill, A Stolcke. Finding consensus in speech recognition: word error minimization and other applications of confusion networks[J]. In Computer, Speech and Language, 2000,14(4): 373-400.
[12] J Xue, Y Zhao. Improved Confusion Network Algorithm and Shortest Path Search from Word Lattice[J]. ICASSP 2005, 2005: 853-856.
[13] J Zheng, C Huang, M Chu, et al. Generalized segment posterior probability for automatic Mandarin pronunciation evaluation[J]. ICASSP 2007, 2007: 201-204.
[14] 王璐,赵欣如,谢簪,等. 普通话测试信息分析[J]. 中文信息学报,2010,24(4):104-110.
[15] P D Patterson, K Robinson, J Holdsworth, et al. “Complex sounds and auditory images”[C].in Auditory and Perception. Oxford, UK: Y Cazals, L Demany, and K Horner, (Eds), Pergamon Press, 1992: 429-446.
[16] B C J Moore, B R Glasberg, “A revision of Zwickers loudness model”[J].Acustica—Acta. Acustica, 1996. 82: 335-345.
[17] M Slaney, “Auditory Toolbox Version 2” Interval Research Corporation Technical Report, 1998, no. 010, 1998.
[18] 李净,郑方,张继勇,等.汉语连续语音识别中上下文相关的声韵母建模[J]. 清华大学学报(自然科学版),2004, 44(1):61-64.
[19] J Li, F Zheng,W H Wu. Context-independent Chinese initial-final acoustic modeling[J]. ISCSLP’00, Oct. 13-15, 2000: 23-26, Beijing.
[20] 孙景涛. 介音在音节中的地位[J].语言科学,2006, 5(2):44-52.
[21] 魏思. 基于统计模式识别的发音错误检测研究[D].安徽:中国科学技术大学,2008.
[22] 何珏,刘加. 汉语连续语音中HMM模型状态数优化方法研究[J]. 中文信息学报,2006, 20(6):83-88.

基金

2010年北京师范大学自主科研基金项目资助;2010年北京师范大学教学建设与改革项目资助
PDF(1837 KB)

Accesses

Citation

Detail

段落导航
相关文章

/