Abstract:A crucial issue in triphone-based continuous speech recognition is the large number of parameters to be estimated against the limited availability of training data. To cope with the problem , two major context-clustering methods , agglomerative (AGG) and tree-based (TB) , have been widely investigated. We analyze both algorithms with respect to their advantage and disadvantage , develop several methods to improve on them , and introduce a novel combined method in the maximum likelihood framework. For LVCSR , the experimental results show the performance can be much improved by using the proposed combined method , compared with those of the existing TB method alone.
[1] S. J. Young , P. C. Woodland , State Clustering in Hidden Markov Model-based Continuous Speech Recognition[J] , Computer , Speech and Language , Oct 1994 , 8 (4) :369 - 384. [2] P. C. Woodland , J. J. Odell , V. Valtchev , S. J. Young , Large vocabulary CSR using HTK[C] , ICASSP'94 ,125 - 128. [3] M. Y. Hwang , X. D. Huang , and F. Alleva , Predicting unseen triphones with senones[C] , ICASSP'93 , 311 - 314. [4] S. J. Young , J. J. Odell , P. C. Woodland , Tree-Based State Tying for High Accuracy Acoustic Modelling[C] , In Proc. Human Language Technology Workshop , March 1994 ,307 - 312. [5] Xuedong Huang , Alex Acero , and Hsiao-Wuen Hon. Spoken Language Processing : A Guide to Theory , Algorithm , and System Development [M] , Prentice Hall PTR , 2001. [6] J. M. Huerta and R. M. Stern , Distortion-Class Modeling for Robust Speech Recognition under GSM RPE-L TP Coding[J] , in Speech Communication , 2001 ,34 (1 - 2) :213 - 225. [7] V. Digalakis , P. Monaco and H. Murveit , Genones : generalized mixture tying in continuous hidden markovmodel-based speech recognizers [J] , IEEE Transactions on Speech and Audio Processing ,July 1996 ,4 , (4) :281 - 289. [8] W. Reichl and W. Chou , Robust Decision Tree State Tying for Continuous Speech Recognition [J] , IEEE Trans. Speech and Audio Proc. , 2000 ,8 (5) :555 - 566. [9] J. Park H. Ko , CONSTRUCTION OF DECISION TREE FROM DATA DRIVEN CLUSTERING [C] , ICSLP 2002 ,2657 - 2660. [10] J. T Chien , C. H Huang , and S. J Chen , COMPACT DECISION TREES WITH CLUSTER VALIDITY FOR SPEECH RECOGNITION[C] , ICCASP 2002 ,2462 - 2465. [11] S. Gao ,J. S Zhang , S. Nakamura ,C. H Lee , T. S Chua , Weighted Graph Based Decision Tree Optimization for High AccuracyAcoustic Modeling[C] , ICSLP2002 ,1233 - 1236. [12] A. Kannan , M. Ostendorf , and J. R. Rohlicek , Maximum Likelihood Clustering of Gaussians for Speech Recognition[J] , IEEE Transactions on Speech and Audio Processing ,July 1994 ,2 (3) :453 - 355.