多信息融合的新闻节目主题划分方法

余骁捷1,吴 及1,孔繁庭2,李树森1

PDF(1257 KB)
PDF(1257 KB)
中文信息学报 ›› 2012, Vol. 26 ›› Issue (2) : 121-128.
综述

多信息融合的新闻节目主题划分方法

  • 余骁捷1,吴 及1,孔繁庭2,李树森1
作者信息 +

Fusing Multi-information for Automatic Story Segmentation of Broadcast News

  • YU Xiaojie1, WU Ji1, KONG Fanting2, LI Shusen1
Author information +
History +

摘要

对新闻播报节目进行自动主题划分,可以有效地组织和利用新闻播报类数据。目前自动故事单元划分的研究以视频数据为主,音频的语音识别文本中包含丰富的语义信息,同时声音事件的转换也可以提供很多重要信息,能够有效的进行基于语义的主题划分。根据这些信息,该文提出了一种基于规则的多信息融合的方法,利用切分点邻域的音频类型信息来修正使用语义信息的切分结果,完成主题划分。实验表明根据规则进行特征融合后,新闻节目主题划分的F-估值为64.8%,错误概率Pk和WindowDiff分别达到18.3%和24.5%。

Abstract

Automatic story segmentation is very important for retrieval of broadcast news data. Recent research on automatic story segmentation is focused on video data. Semantic information extracted from speech recognition results and acoustic event information of audio data provide important information for story segmentation. This paper proposes a rule-based multi-information fusion method, using the audio information to adjust the results of text story segmentation. Experiments show that after fusing multi-information, the F-measure of automatic story segmentation of broadcast news data reaches 64.8%, Pk and WindowDiff reach 18.3% and 24.5% respectively.
Key wordsautomatic story segmentation; improved SeLeCT algorithm; multi-information fusion

关键词

新闻节目主题划分 / 改进的SeLeCT算法 / 信息融合

Key words

automatic story segmentation / improved SeLeCT algorithm / multi-information fusion

引用本文

导出引用
余骁捷1,吴 及1,孔繁庭2,李树森1. 多信息融合的新闻节目主题划分方法. 中文信息学报. 2012, 26(2): 121-128
YU Xiaojie1, WU Ji1, KONG Fanting2, LI Shusen1. Fusing Multi-information for Automatic Story Segmentation of Broadcast News. Journal of Chinese Information Processing. 2012, 26(2): 121-128

参考文献

[1] Liu Hua-yong. News story automatic segmentation based on audio-visual feature and text information[J]. Journal of System Simulation, 2004, 16(11): 2608-2610.
[2] Zhang Chun-lin, Zhang Peng-lin, Hu Rui-min. News story detection based on anchorpersons identification in news video[J]. Computer Engineering, 2003, 29(14): 20-26.
[3] 徐新文, 李国辉, 甘亚莉. 基于播音员识别的新闻视频故事单元分割方法[J]. 计算机工程与应用, 2008, 44(19): 4-7.
[4] Marti A. Hearst. TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages[J]. Computational Linguistics, 1997, 23(1): 33-64.
[5] Nicola Stokes, Joe Carthy, Alan F. Smeaton. SeLeCT: a lexical cohesion based news story segmentation system[J]. Journal of AI Communication, 2004, 17(1): 3-12.
[6] 傅间莲, 陈群秀. 自动文摘系统中的主题划分问题研究[J]. 中文信息学报, 2005, 19(6): 28-35.
[7] 杨玉莲, 谢磊. 基于子词链的中文新闻广播故事自动分割[J]. 计算机应用与研究, 2009, 26(2): 583-586、594.
[8] Allan J, Carbonell J, Doddington G, et al. Topic detection and tracking pilot study final report[C]//Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, Virginia, USA, 1998: 194-218.
[9] Doug Beeferman, Adam Berger, John Lafferty. Statistical Models for Text Segmentation[J]. Machine Learning, 1999, 34(1-3): 177-210.
[10] Qi W, Gu L, Jiang H, et al. Integrating visual, audio and text analysis for news video[C]//Proceedings of 7th IEEE Intnl Conference on Image Processing, 2000.
[11] Hsu W, Kennedy L, Huang C-W. News video story segmentation using fusion of multi-level multi-modal features in trecvid 2003[C]//Proceedings of ICASSP2004, 645-648.
[12] Liu Z, Huang J C, Wang Y. Classification of TV programs based on audio information using hidden Markov model[C]//Proceedings of IEEE Workshop on Multimedia Signal Processing, Redondo Beach, CA, USA, 1998: 27-32.
[13] 刘群, 张华平, 俞鸿魁,等. 基于层叠隐马模型的汉语词法分析[J]. 计算机研究与发展, 2004, 41(8): 1421-1429.
[14] Lev Pevzner, Marti A. Hearst. A Critique and Improvement of an Evaluation Metric for Text Segmentation[J]. Computational Linguistics, 2002, 28(1): 19-36.

PDF(1257 KB)

464

Accesses

0

Citation

Detail

段落导航
相关文章

/