本文首先讨论了汉语方言辨识的依据及特征选取的基本原则,并由此导出了区间差分倒谱特征。然后利用GMM符号发生器和N元语言模型及ANN建立了一个方言辨识系统,该系统与传统的语种识别系统相比,具有以下特点:第一,系统不需要标注好的语音库,从而降低了汉语方言语音库建设的劳动强度和要求;第二, GMM符号化器计算量远远低于音素辨识器,从而提高了方言辨识速度,便于今后实时处理。第三,具有更高的辨识效果和更好的容错性。汉语普通话和三种方言辨识实验结果表明,系统平均辨识率可以达到83.8%。
Abstract
This paper discusses the criterions for distinguishing different Chinese dialects and the basic features selection firstly. According to these principals, a novel feature named district differential cepstral feature was proposed. Then, a novel dialect identification system combining GMM tokenizer, N-gram language model and ANN is constructed. Compared with traditional LID system, the new system has following characteristics: first, it is unnecessary to use tagged dialects speech database ,which becomes less labour-intensive to build corpora. Second, GMM tokenizer is more computationally efficient. Third, the system has more accurate and robust performance. In a test under Chinese dialects classification, averagely 83.8% accuracy is achicved.
关键词
计算机应用 /
中文信息处理 /
GMM符号化器 /
N元语言模型 /
汉语方言辨识
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
GMM tokenizer /
n-gram language modeling /
Chinese dialects identification
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Wuei-He Tsai, Wen-Whei Chang, Discrimination Training of Guassian Mixture Bigram Models with Application to Chinese Dialect Identification[J]. Speech Communication, 2002, 36: 317 - 326.
[2] 陈海伦. 方言机器识别技术研究[J]. 公安大学学报, 2000, 1 (1) : 33 - 38.
[3] Y. K. Muthusamy, E. Barnard, and R. A. Cole, Reviewing Automatic Language Identification [J]. IEEE Signal Processing Mag. , 1994, 11 (4) : 33 - 41.
[4] M. A. Zissman, Comparison of Four Approaches to Automatic Language Identification of Telephone Speech [J]. IEEE Trans. Speech and Audio Processing, 1996, 4 (1) : 31 - 34.
[5] Alvin F. Martin, Mark A. Przybocki, NIST 2003 Language Recognition Evaluation [M]. In: EuroSpeech [C] , 2003.
[6] Torres-Carrasquillo, P. A. ; Reynolds, D. A. ;Deller, J. R. , Jr. Language identification using Gaussian mixture model tokenization [A]. IEEE International Conference on Acoustics, Speech, and Signal Processing [C] ,Orlando, Florida, May 2002,USA.
[7] F. Jelinek, Statistical Methords for Speech Recognition [M]. Cambridge, Massachusetts, MIT Press, 1999.
[8] 周志华,曹存根. 经网络及其应用[M] ,北京:清华大学出版社, 2004年9月.
[9] 侯精一. 现代汉语方言音库[M] ,上海:上海教育出版社, 1994年~1999年.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
江苏省“十五”社科基金资助项目(K3-013);江苏省高校自然科学基金资助项目(99KJB510002)
{{custom_fund}}