Abstract:Facing with the age of big data, it is of great importance to locate key sensitive information from various audio and video that are ever-increasing. Although such teachnology named speech retrieval technology has been well addressed in Chinese and English,the Uyghur speech retrieval technology is still in its infancy. This paper investigates this issue and establishes a Uyghur speech retrieval system by using such technologies as of the large vocabulary continuous speech recognition, the confusion network for latice, the inverted index, and relevance estimation. Experimental results show that at the level of 82.1% accuracy rate for speech recognition,the system recall reaches 97.0% and 79.1%,with the false alarm rates of 13.5% and 8.5%, respectively.
[1] A Hauptmann,H Wactlar.Indexing and Search of Multimodal Information[A].Proceedings of IEEE International Conference of Acoustics Speech and Signal Processing,Munich,Germany,1997[C]: 195-198. [2] 郑铁然,韩记庆,李海洋.基于词片的语言模型及在汉语语音检索中的应用[J].通信学报,2009,30(3): 84-88. [3] G J.E Jones,J.T.Foote,K Sparck Jones et al.Video mail retrieval:the Effect of Word Spotting Accuracy on Precision[A].International Conference on Acoustics,Speech,and Signal Processing 1995[C].ICASSP’95,1995,1(1):309-312P. [4] GOOG-411[DB/OL],http://en.wikipedia.org/wiki/GOOG-411, 2008,12. [5] Hsin-min Wang.Mandarin Spoken Document Retrieval Based on Syllable Lattice Matching[J].Pattem Recognition Letters.2000: 615-624P. [6] 郑铁然,韩纪庆.基于音节Lattice的汉语语音检索技术及其索引去冗余方法[J].声学学报,2008,33(6): 526-533. [7] 那斯尔江·吐尔逊,吾守尔·斯拉木.基于隐马尔可夫模型的维吾尔语连续语音识别系统[J].计算机应用,2009,29(7): 2009-2011. [8] 木合塔尔·沙地克,李 晓,布合力齐姑丽·瓦斯力.维吾尔语广播新闻连续语音敏感词检索系统[J].计算机系统应用,2012,21(3): 29-35. [9] L. Mangu, E. Brill, A. Stolcke. Finding Consensus in Speech Recognition: Word Error Minimization and Other Applications of Confusion Networks[J]. Computer Speech And Language,2000,14:373-400. [10] Ville T.Turunen,Mikko Kurimo.Indexing Confusion Network for Morph-based Spoken document Retrieval[A],Proceedings of the SIGIR[C]//2007: 631-638. [11] F K Soong,W K Lo, S Nakamura.Generalized Word Posterior Probablity(GWPP) for Measuring Reliability of Recognized Words[A].Proceeding of the SWIM2004,2004:127-128. [12] F Wessel,R Schluter,K Macherey et al.Confidence Maesures for Large Vocabulary Continuous Speech Recognition[A].IEEE Transactions on Speech and Audio Processing,2001,9(3):288-298. [13] 努尔麦麦提·尤鲁瓦斯,吾守尔·斯拉木.面向大词汇量的维吾尔语连续语音识别研究[J].计算机工程与应用,2013,49(9): 115-119. [14] Young S.The HTK book[EB/OL].[2012-03-031].http://htk.eng.cam.ac.uk/. [15] 陶梅,吾守尔·斯拉木,那斯尔江·吐尔逊.基于HTK的维吾尔语连续语音声学建模[J].中文信息学报,2008,22(5): 56-59. [16] Andreas Stolcke.SRILM—AN EXTENSIBLE LANGUAGE MODELING TOOLKIT.Speech Technology and Research Laboratory,SRI International, Menlo Park, CA, U.S.A.[EB/OL].[2004-07].http://www.speech.sri.com. [17] 米成刚,王磊,杨雅婷,等.维汉机器翻译未登录词识别研究[J].计算机应用研究,2013,4,30(4): 1112-1115.