Abstract:To improve in-vocabulary performance in Mongolian speech keyword spotting task, we propose a Mongolian speech keyword spotting method by searching the stem according to the characteristic of Mongolian word-formation rule. First, Mongolian speech is decoded to lattice file by Segmentation-based LVCSR system, and this lattice file is converted to a confusion network. Then, we detect the keywords according to their stems among the confusion network. Experimental results show that the proposed method outperforms baselines based on word confusion network.
[1] Feilong Bao, Guanglai Gao. Improving of Acoustic Model for the Mongolian Speech Recognition System[C]//Proceedings of The Chinese Conference on Pattern Recognition (CCPR2009), Nanjing, 2009: 616-620.
[2] Feilong Bao, Guanglai Gao, Xueliang Yan. Segmentation-based Mongolian LVCSR Approach[C]//Proceedings of The 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP2013), Vancouver, 2013: 8136-8139.
[3] Feilong Bao, Guanglai Gao, Yulai Bao. The Research on Mongolian Spoken Term Detection Based on Confusion Network[C]//Proceedings of The Chinese Conference on Pattern Recognition (CCPR2012), Beijing, 2012: 606-612.
[4] 清格尔泰. 蒙古语语法[M], 内蒙古人民出版社,1992.
[5] L Mangu, E Brill, A Stolcke. Finding consensus in speech recognition: word error minimization and other applications of confusion networks [J]. Computer Speech and Language, 2000, 14(4): 373-400.
[6] 黄湘松. 基于混淆网络的汉语语音检索技术研究 [D]. 哈尔滨工程大学博士学位论文. 2010.
[7] J Mamou, B Ramabhadran, O Siohan. Vocabulary independent spoken term detection[C]//Proceedings of ACM-SIGIR07, Amsterdam, 2007: 615-622.
[8] P Yu, K Chen, C Ma, et al. Vocabulary-independent indexing of spontaneous speech[J]. Speech Audio Process. 2005, 13(5): 635-643.
[9] Young S, et al. The HTK book (Revised for HTK version 3.4.1) [M]. Cambridge University .2009.
[10] A Stolcke. SRILM—An Extensible Language Modeling Toolkit[C]//Proceedings of International Conference Spoken Language Processing, Denver, Colorado, 2002.