一种两层次无监督的音频分割算法

PDF(484 KB)

中文信息学报 ›› 2007, Vol. 21 ›› Issue (2) : 106-111.

综述

一种两层次无监督的音频分割算法

张世磊,张树武,徐波

作者信息 +

A Two Level Unsupervised Algorithm for Audio Segmentation

ZHANG Shi-lei, ZHANG Shu-wu, XU Bo

Author information +

History +

摘要

本文提出一种两层次无监督音频分割算法,它用于检测音频流中的说话人、环境、信道等声学特征变化点,该方法将音频分割过程分为两个层次: 区域层次和边界层次,通过固定检测窗移动,它快速定位到声学特征变化点存在的区域,然后在潜在变化区域内采用T² 统计值和贝叶斯信息准则(BIC)结合的方法快速确定片断边界。在区域检测层次,将修正的广义对数似然比准则应用于潜在的变化区域检测,它即无需设定阈值门限又可保证低的漏检率,在1997年Hub4中文广播语音数据库上的实验结果表明,同传统的混合分割算法比较,该算法在处理速度得到提高的同时,声学特征变化点的召回率提高10.5％。

Abstract

We propose a two level unsupervised method for audio segmentation that detects acoustic changes of speaker, environment and channel in a continuous audio stream effectively. In our approach, we divide the change detection process into two levels: region level that detects the potential change regions containing candidate acoustic change points, and boundary level that searches and refines the true change points. At the region level, we employ the modified Generalized Likelihood Ratio metric to search for the potential change regions in continuous local windows without setting any threshold. At the boundary level, we perform T² and Bayesian Information Criterion algorithm to detect segment boundaries within the potential windows. The experimental results on the 1997 Broadcast News Hub4-NE mandarin corpus show the proposed scheme can get nearly 10.5% recall rate increase.

导出引用

张世磊,张树武,徐波. 一种两层次无监督的音频分割算法. 中文信息学报. 2007, 21(2): 106-111

ZHANG Shi-lei, ZHANG Shu-wu, XU Bo. A Two Level Unsupervised Algorithm for Audio Segmentation. Journal of Chinese Information Processing. 2007, 21(2): 106-111

参考文献

[1] NIST Spoken Language Technology Evaluations: Benchmark Tests [EB/OL]. http://www.nist.gov/speech/tests/index.htm.
[2] 穆向禹. 汉语广播语音识别系统的研究 [D]. 北京: 中国科学院自动化研究所,2005.
[3] 贾磊,穆向禹,徐波. 广播语音的音频分割 [J]. 中文信息学报,2002,16(1):37-42.
[4] Zhou B, Hansen J. Efficient audio stream segmentation via T2 statistic based Bayesian information criterion [J]. IEEE Transactions on Speech Audio Process, 2005, 13(4): 467-474.
[5] Chen S, Gopalakrishnan P. Speaker, environment and channel change detection and clustering via the Bayesian information criterion [A]. DARPA Broadcast News Trans. and Under [C]. Workshop, 1998.8.
[6] Delacourt P, Wellekens CJ. DISTBIC: a speaker-based segmentation for audio data indexing [J]. Speech Communication, 2000, 32: 111-126.
[7] Lu L, Zhang HJ. Real-Time Unsupervised Speaker Change Detection [A]. In: Proceedings of ICPR (2) 2002 [C]. Quebec, Canada, 2002: 358-361.
[8] 卢坚,等.一种改进的基于说话者的语音分割算法 [J].软件学报,2002,13(2):274-279.
[9] Cheng S, Wang H. METRIC-SEQDAC: A Hybrid Approach for Audio Segmentation [A]. In: Proceedings of ICSLP2004 [C]. Jeju Island, Korea, 2004: 1617-1620.
[10] Cheng S, Wang H. A Sequential Metric-based Audio Segmentation Method via The Bayesian Information Criterion [A]. In: Proceedings of Eurospeech2003 [C]. Geneva, Switzerland, 2003: 945-948.
[11] Zhou B, Hansen J. Unsupervised Audio Stream Segmentation and Clustering Via the Bayesian Information Criterion [A]. In: Proceedings of ICSLP2000 [C]. China, 2000:714-717.
[12] J. Ajmera. Robust Audio Segmentation [D]. Ph.D. Thesis, 2004.

基金

国家自然科学基金(60475014)

PDF(484 KB)

Accesses

Citation

Detail

段落导航

摘要
Abstract
关键词
Key words
引用本文
参考文献
基金

Received	Published
2006-03-12	2007-04-16
Issue Date
2007-04-16

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注

基金