孟莎,刘加. 汉语语音检索的集外词问题与两阶段检索方法[J]. 中文信息学报, 2009, 23(6): 91-98.
MENG Sha LIU Jia. Out-of-Vocabulary Issue in Chinese Spoken Term Detection and A Two-Stage Chinese Speech Retrieval Method. , 2009, 23(6): 91-98.
汉语语音检索的集外词问题与两阶段检索方法
孟莎,刘加
清华信息科学与技术国家实验室(筹) 清华大学电子工程系,北京 100084
Out-of-Vocabulary Issue in Chinese Spoken Term Detection and A Two-Stage Chinese Speech Retrieval Method
MENG Sha LIU Jia
Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
Abstract:While the Out-of-Vocabulary (OOV) problem remains a challenge for English spoken term detection tasks, it is underestimated for Chinese. This is because a Chinese OOV query term can still be matched as a sequence of Chinese characters, with each character itself being a word in the vocabulary. However, our experiments show that search accuracy levels differ significantly when a query is or is not in the vocabulary. We examine this problem with a word-lattice-based spoken term detection task. We propose a two-stage method by first locating candidates by partial phonetic matching and then refining the matching score with word lattice rescoring. Experiments show that the proposed method achieves a 24.1% relative improvement for OOV queries on a large-scale Chinese spoken term detection task. Key wordscomputer application; Chinese information processing; Chinese spoken term detection; out-of-vocabulary; lattice; large-vocabulary continuous speech recognition
[1] M. Saraclar and R. Sproat. Lattice-based Search for Spoken Utterance[C]//Proceeding of Human Language Technology Conference. Boston, 2004: 129-136. [2] C. Chelba and A. Acero. Position specific posterior lattices for indexing speech [C]//Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Ann Arbor, 2005: 443-450. [3] F. Seide, P. Yu and Y. Shi. Towards Spoken-Document Retrieval for the Enterprise: Approximate Word-Lattice Indexing with Text Indexers [C]//Proceeding of IEEE Workshop on Automatic Speech Recognition and Understanding. Kyoto, 2007: 629-634. [4] B. Logan, P. Moreno, J. M. Van Tong et al. An Experimental Study of an Audio Indexing System for the Web [C]//Proceeding of Sixth International Conference on Spoken Language Processing. Beijing, 2000: 676-679. [5] K. Ng. Subword-Based Approaches for Spoken Document Retrieval [D]. Ph.D. thesis, Massachusetts Institute of Technology, 2000. [6] P. Yu and F. Seide. A Hybrid Word/Phoneme-based Approach for Improved Vocabulary-independent Search in Spontaneous Speech [C]//Proceeding of Sixth International Conference on Spoken Language Processing, Korean, 2004: 293-296. [7] J. Shao, P Yu, Q. Zhao, Y. Yan. F. Seide. Towards Vocabulary-Independent Speech Indexing for Large-Scale Repositories [C]//Proceeding of Interspeech. Brisbane, 2008: 2150-2153 [8] H. M. Wang, H. Meng, P. Schone, B. Chen, W. K. Lo. Multi-Scale Audio Indexing for Translingual Spoken Document Retrieval [C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Salt Lake City, 2001: 605-608. [9] 孟猛,王晓瑞,梁家恩,徐波.一种基于互补声学模型的多系统融合语音关键词检测方法[J].自动化学报, 2009, 35(1): 39-45. [10] 倪崇嘉,刘文举,徐波.汉语大词汇量连续语音识别系统研究进展[J].中文信息学报,2009,23(1): 112-128. [11] 孟莎,余鹏,Frank Seide等. 基于后验概率词格的汉语自然对话语音索引[J].清华大学学报,2008,48(S1): 673-677. [12] Y. C. Pan, H. L. Chang, B. Chen and L. S. Lee. Subword-based Position Specific Posterior Lattices (S-PSPL) for Indexing Speech Information [C]//Proceedings of Interspeech. Antwerp, 2007: 318-321 [13] David R. H. Miller, Michael Kleber, Chia-lin Kao et al. Rapid and Accurate Spoken Term Detection [C]//Proceedings of Interspeech. Antwerp, 2007: 314-317 [14] 周梁,高鹏,丁鹏,徐波.语音识别准确率与检索性能的关联性研究[J].中文信息学报,2006,20(3): 99-104. [15] F.Wessel, R. Schluter, K. Macherey et al. Confidence Measures for Large Vocabulary Continuous Speech Recognition[J]. IEEE transaction on Speech and Audio Processing, 2001, 9(3): 288-298. [16] P. Yu, K. J. Chen, C. Y. Ma et al. Vocabulary-independent Indexing of Spontaneous Speech[J]. IEEE transaction on Speech and Audio Processing, 2005, 13(5): 635-643.