本文提出一种两层次无监督音频分割算法,它用于检测音频流中的说话人、环境、信道等声学特征变化点,该方法将音频分割过程分为两个层次: 区域层次和边界层次,通过固定检测窗移动,它快速定位到声学特征变化点存在的区域,然后在潜在变化区域内采用T2 统计值和贝叶斯信息准则(BIC)结合的方法快速确定片断边界。在区域检测层次,将修正的广义对数似然比准则应用于潜在的变化区域检测,它即无需设定阈值门限又可保证低的漏检率,在1997年Hub4中文广播语音数据库上的实验结果表明,同传统的混合分割算法比较,该算法在处理速度得到提高的同时,声学特征变化点的召回率提高10.5%。
Abstract
We propose a two level unsupervised method for audio segmentation that detects acoustic changes of speaker, environment and channel in a continuous audio stream effectively. In our approach, we divide the change detection process into two levels: region level that detects the potential change regions containing candidate acoustic change points, and boundary level that searches and refines the true change points. At the region level, we employ the modified Generalized Likelihood Ratio metric to search for the potential change regions in continuous local windows without setting any threshold. At the boundary level, we perform T2 and Bayesian Information Criterion algorithm to detect segment boundaries within the potential windows. The experimental results on the 1997 Broadcast News Hub4-NE mandarin corpus show the proposed scheme can get nearly 10.5% recall rate increase.
关键词
人工智能 /
模式识别 /
两层次无监督音频分割 /
修正广义似然比 /
区域层次 /
边界层次
{{custom_keyword}} /
Key words
artificial intelligence /
pattern recognition /
two level unsupervised method /
modified generalized likelihood ratio /
region level /
boundary level
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] NIST Spoken Language Technology Evaluations: Benchmark Tests [EB/OL]. http://www.nist.gov/speech/tests/index.htm.
[2] 穆向禹. 汉语广播语音识别系统的研究 [D]. 北京: 中国科学院自动化研究所,2005.
[3] 贾磊,穆向禹,徐波. 广播语音的音频分割 [J]. 中文信息学报,2002,16(1):37-42.
[4] Zhou B, Hansen J. Efficient audio stream segmentation via T2 statistic based Bayesian information criterion [J]. IEEE Transactions on Speech Audio Process, 2005, 13(4): 467-474.
[5] Chen S, Gopalakrishnan P. Speaker, environment and channel change detection and clustering via the Bayesian information criterion [A]. DARPA Broadcast News Trans. and Under [C]. Workshop, 1998.8.
[6] Delacourt P, Wellekens CJ. DISTBIC: a speaker-based segmentation for audio data indexing [J]. Speech Communication, 2000, 32: 111-126.
[7] Lu L, Zhang HJ. Real-Time Unsupervised Speaker Change Detection [A]. In: Proceedings of ICPR (2) 2002 [C]. Quebec, Canada, 2002: 358-361.
[8] 卢坚,等.一种改进的基于说话者的语音分割算法 [J].软件学报,2002,13(2):274-279.
[9] Cheng S, Wang H. METRIC-SEQDAC: A Hybrid Approach for Audio Segmentation [A]. In: Proceedings of ICSLP2004 [C]. Jeju Island, Korea, 2004: 1617-1620.
[10] Cheng S, Wang H. A Sequential Metric-based Audio Segmentation Method via The Bayesian Information Criterion [A]. In: Proceedings of Eurospeech2003 [C]. Geneva, Switzerland, 2003: 945-948.
[11] Zhou B, Hansen J. Unsupervised Audio Stream Segmentation and Clustering Via the Bayesian Information Criterion [A]. In: Proceedings of ICSLP2000 [C]. China, 2000:714-717.
[12] J. Ajmera. Robust Audio Segmentation [D]. Ph.D. Thesis, 2004.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(60475014)
{{custom_fund}}