广播语音的自动识别、标注、检索等是涉及到语音技术、自然语言处理、信息检索等多个领域的综合性课题。在介绍了广播语音的自动标注与检索的研究概况并分析了其中涉及的关键技术基础上,提出了面向普通话广播语音的多层次自动标注框架以及基于多层次标注的语音检索方案,对文档层、句子层和词语层的标注属性进行了探讨,采用了递归标注方法对属性逐层细化,并讨论了对语音自动标注至关重要的语音识别引擎和语音流分割等问题。基于本文提出的方法,对10 小时的普通话广播语音资料进行了标注和检索,得到了比较满意的实验结果。
Abstract
The automatic t ranscription , annotation and ret rieval of broadcasting news requires automatic speech recognition , natural language processing and information retrieval technologies. The state-of-the-art of broadcasting news automatic annotation and retrieval progress were discussed and the related key techniques were analyzed ; then an approach of multi-level automatic annotation frame for Mandarin broadcasting news and ret rieval method based on that annotation frame were presented ; the annotation attributes for document level , utterance level and word level were investigated , the recursive method for multi-level annotation was proposed ; Furthermore , the speech recognition engine and audio st ream media segmentation problems which are closely related the speech annotation problem were investigated ; the proposed approaches were applied to 102hours’ Mandarin broadcasting news for annotation and ret rieval , the experiment result s were satisfactory.
关键词
计算机应用 /
中文信息处理 /
广播语音 /
自动标注 /
语音检索 /
声学模型 /
语言模型
{{custom_keyword}} /
Key words
computer application /
chinese information processing /
broadcasting /
automatic annotation /
speech retrieval /
acoustic model /
language model
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1 ] S. Renals , and D. Abberley. The THISL SDR system at TREC29 [A] . In : Proc. 9th Text Ret rieval Confer2 ence ( TREC29 ) [ C ] . Gaithersburg : NIST Special Publication , 2000 : 6272634. http :/ / t rec. nist . gov/ pubs/ t rec9/ t9_proceedings. html.
[2 ] M. Brown , J . Foote , G. Jones , K. Sparck2Jones , and S. Young. Automatic content2based ret rieval of broad2 cast news [A] . In : Proc of 3rd ACM Multimedia Con2 ference [C] . San Francisco , USA , 1995 : 35243.
[3 ] M.Bertini , A. Del Bimbo , P. Pala. Content2based in2 dexing and ret rieval of TV news [J ] . Pattern Recogni2 tion Letters , 2001 ,22 (5) : 5032516.
[4 ] W. Zhu , C. Toklu , S.2P. Liou. Automatic news vid2 eo segmentation and categorization based on closed2 captioned text [ A ] . In : Proc. IEEE International Conference on Multimedia and Expo [ C] . Tokyo , J a2 pan , 2001 :103621039.
[5 ] Sutcliffe , A. et al. Empirical studies in multimedia in2 formation ret rieval [A] . In : Intelligent Multimedia In2 formation Ret rieval [ C] . AAAI Press , . Menlo Park , CA , 1997.
[6 ] S. S. Chen , E. Eide , M. J . F. Gales , R. A. Gopi2 nath , D. Kanevsky and P. Olsen. Automatic Tran2 scription of Broadcast News [J ] . Speech Communica2 tion , 2002 (37) :69287.
[7 ] J . Hirschberg , S. Whittaker , D. Hindle , F. Pereira , and A. Singhal. Finding information in audio : a new paradigm for audio browsing and ret rieval [ A ] . In : Proc. ESCA ETRW Workshop Accessing Information in Spoken Audio[C] . Cambridge , U K, 1999 :1172122.
[8 ] David Pallett . A Look at NIST’s Benchmark ASR test s : Past , Present , and Future [ EB] . http :/ / www. nist . gov/ speech/ history/ pdf/ NIST benchmark ASRt2 est s 2003. pdf . 2006 ,12219.
[9 ] Ricardo Baeza2Yates , Berthier Ribeiro2Neto , et al. Modern Information ret rieval [ M ] . ACM Press , 1999.
[10 ] Kat sutoshi Oht suki , Kat suji Bessho , Yoshihiro Mat2 suo , Shoichi Mat sunaga and Yoshihiko Hayashi. Au2 tomatic Multimedia Indexing2Combining audio , speech , and visual information to index broadcast news. IEEE Signal Processing Magazine[J ] . March , 2006 : 69278.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60572125)
{{custom_fund}}