篇章标注是自然语言处理中的重要任务,很多其他任务,如自动摘要、机器问答等都可以通过篇章标注得到对文本内容和语义的认识,从而获得更好的结果。与此同时,篇章理解的理论如篇章修辞结构(RST),向心理论(CT)等与实际问题的结合并不紧密,难以实用。该文中我们参考现有的语言学理论和一些语篇标注库(如RST-DT,PDTB),并结合自然语言处理任务特点,提出了一套用于篇章标注的汉语标注体系。这个体系能够比较准确和全面地描述出篇章的内容和逻辑关系,并很好地服务于实际任务的需要。
Abstract
Discourse Tagging is fundamental in natural language processing and helpful to a deep understanding of the texts. Many application tasks, such as automatic summarization, question & answering and so on, would benefit a lot from a thorough understanding of the text. On the basis of the existing discourse theories such as Rhetoric Structure Theory or Centering Theory, this paper designs a new discourse tagging system, which covers both the logical relations and text content or the practical needs of real natural language processing tasks.
关键词
篇章语义标注 /
修辞结构理论 /
关系标签 /
内容标签
{{custom_keyword}} /
Key words
discourse tagging /
rhetoric structure theory /
relation tag /
content tag
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Mann William C, Sandra A Thompson. Rhetorical Structure Theory: Description and Construction of Text Structures[C]//Proceedings of University of Southern California, Information Sciences Institute, 1986.
[2] Walker M A. Centering Theory in Discourse[M]. Oxford:Clarendon Press, 1998.
[3] Carlson Lynn, Daniel Marcu, Mary Ellen Okurowski. Building a discourse-tagged corpus in the framework of rhetorical structure theory[C]//Proceedings of the Second SIGdial Workshop on Discourse and Dialogue-Volume 16. Association for Computational Linguistics, 2001.
[4] The Penn Discourse TreeBank 1.0 Annotation Manual[R]. The PDTB Research Group. March 29, 2006.
[5] Prasad Rashmi, Diresh Nikhll, Lee Alan, et al. The penn discourse treebank 2.0[C]//Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008). 2008.
[6] 乐明. 汉语财经评论的修辞结构标注研究[C].第九届全国计算语言学学术会议,2007
[7] 娄开阳. 现代汉语新闻语篇的结构研究[M],北京: 世界图书出版公司,2008.
[8] 李毅,亢世勇,孙茂松,孙道功. 基于奥运语料的语义成分标注规范[C].全国第八届计算语言学联合学术会议,南京,2005.
[9] Baker Collin F, Charles J Fillmore, John B. Lowe. The berkeley framenet project[C]//Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 1998.
[10] Fillmore Charles J. Frame Semantics and the Nature of Language[J]. Annals of the New York Academy of Sciences, 1976,280(1): 20-32.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61273278);国家社会科学项目(12&ZD227);国家科技支撑计划子课题项目(2011BAH10B04-03);国家863计划(2012AA011101)。
{{custom_fund}}