Abstract:In this paper , three different approaches of Chinese-English bilingual acoustic modeling are investigated and compared. The first approach is to simply combine Chinese and English phone inventories together without phone shared across the languages. The second one is to map language-dependent phones to the inventory of the International Phonetic Association (IPA) based on phonetic knowledge to construct the bilingual phone inventory. The third one is to merge the language-dependent phone models by hierarchical phone clustering algorithm to get a compact bilingual inventory. Experimental results show that phone clustering approach outperforms IPA-based phone mapping approach , and it can also achieve comparable performance to the simple combination of language-dependent phone inventories with less model parameters , especially when using acoustic likelihood measurement .
[1] Byrne. B. , P. Beyerlein , J. M. Huerta et al. , Towards Language Independent Acoustic Modeling [A] . IEEE ICASSP [C] , 2000 , Istanbul , Turkey. 2 :1029 - 1032. [2] Zhang Shuwu , Striding over language boundary in automatic speech recognition[A] . 第十二届全国神经计算学术会议(特邀论文) [C] ,清华大学,2002. [3] Adda-Decker M. , Towards Multilingual Interoperability in Automatic Speech Recognition [J] , Speech Communication , 2001 ,35 (1 - 2) :5 - 20. [4] Wells , C. J. , Computer-coded phonemic notation of individual languages of the European community [J] . J. Int . Phonetic Assoc. , 1989 ,19 :32 - 54. [5] Hieronymus , J. L. , ASCII phonetic symbols for the world's languages Worldbet [J] . J. Int. Phonetic Assoc. , 1993 ,23. [6] IPA , The International Phonetic Association (revised to 1993) - IPA Chat [J] . J. Int. Phonetic Assoc. , 1993 ,23. [7] Schultz T. and A. Waibel , Language-independent and language-adaptative acoustic modeling for speech recognition [J] . Speech Communication , 2001 ,35 (1 - 2) :31 - 51. [8] K?hler J. , Multilingual phone models for vocabulary-independent speech recognition tasks [J] , Speech Communication , 2001 ,35 (1 - 2) :21 - 30. [9] Uebler U. , Multilingual speech recognition in seven languages [J] , Speech Communication , 2001 ,35 (1 - 2) : 53 - 69. [10] Bin Ma and Qiang Huo. Benchmark results of triphone-based acoustic modeling on HKU96 and HKU99 putonghua corpora[A] , ISCSLP [C] , 2000 , 359 - 362. [11] Brian Mak and Etienne Barnard. Phone clustering using the bhattacharyya distance[A] , ICSLP [C] , 1996 , 2005 - 2008. [12] Juang , B. H. , Rabiner , L. R. , A probabilistic distance measure for hidden Markov models [J] . Bell Syst. Tech. J. , 1985 ,64 (2) :391 - 408. [13] Linguistic Data Consortium (LDC) , University of Pennsylvania , http://www.ldc.upenn.edu. [14] Yu Shengmin , Hu Sheng , Zhang Shuwu et al. , Chinese-English bilingual speech recognition[A] , ICNLP-KE [C] , Beijing ,2003 ,603 - 609.