基于框架的汉语篇章结构生成和篇章关系识别

吕国英,苏 娜,李 茹,王智强,柴清华

PDF(6179 KB)
PDF(6179 KB)
中文信息学报 ›› 2015, Vol. 29 ›› Issue (6) : 98-109.
综述

基于框架的汉语篇章结构生成和篇章关系识别

  • 吕国英1,苏 娜1,李 茹1,2,王智强1,柴清华3
作者信息 +

Frame-Based Discourse Structure Modeling andRelation Recognition for Chinese Sentence

  • LV Guoying1,SU Na1,LI Ru1,2,WANG Zhiqiang1,CHAI Qinghua3
Author information +
History +

摘要

针对汉语篇章分析的三个任务: 篇章单元切割、篇章结构生成和篇章关系识别,该文提出引入框架语义进行分析研究。首先基于框架构建了汉语篇章连贯性描述体系以及相应语料库;然后抽取句首、依存句法、短语结构、目标词、框架等特征,分别训练基于最大熵的篇章单元间有无关系分类器和篇章关系分类器;最后采用贪婪算法自下向上生成篇章结构树。实验证明,框架语义可以有效切割篇章单元,并且框架特征可以有效提升篇章结构以及篇章关系的识别效果。

Abstract

Frame semantics is introduced to the research of Chinese discourse analysis which includes three subtasks discourse segmentation, discourse structure modeling and discourse relation recognition. First, the Chinese discourse coherence framework and a corresponding corpus is built based on frame semantics. Then two kinds of maximum entropy classifiers are applied to recognize the relation between discourse units and the class of discourse relation based on lexical features, dependency parser features, syntactic parser features, target features and frame sematic features. Finally, we use probability of the relation existence between discourse units to generate the discourse structure by greedy bottom-up method. Experimental results show that frame sematic can segment discourse units effectively and frame sematic feature can improve the performance of discourse structure construction and discourse relation recognition.
Key words Discourse units; Discourse Structure; Discourse Relation; Greedy Bottom-up Method
   
   
   

关键词

篇章单元 / 篇章结构 / 篇章关系 / 贪婪算法

Key words

Discourse units / Discourse Structure / Discourse Relation / Greedy Bottom-up Method

引用本文

导出引用
吕国英,苏 娜,李 茹,王智强,柴清华. 基于框架的汉语篇章结构生成和篇章关系识别. 中文信息学报. 2015, 29(6): 98-109
LV Guoying,SU Na,LI Ru,WANG Zhiqiang,CHAI Qinghua. Frame-Based Discourse Structure Modeling andRelation Recognition for Chinese Sentence. Journal of Chinese Information Processing. 2015, 29(6): 98-109

参考文献

[1] Mann W C, Thompson S A. Rhetorical structure theory: A framework for the analysis of texts[J]. Iprapapers in Pragmatics, 1987,1: 79-105.
[2] Prasad R, Joshi A. A discourse-based approach to generating why-questions from texts[C]//Proceedings of the Workshop on the Question Generation Shared Task and Evaluation Challenge, Arlington, VA. 2008.
[3] Lin Z, Ng H T, Kan M Y. Automatically evaluating text coherence using discourse relations[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 2011: 997-1006.
[4] Mann W C, Thompson S A. Rhetorical structure theory: Toward a functional theory of text organization [J]. Text, 1988,8(3): 243-281.
[5] Carlson L, Marcu D, Okurowski M E. Building a discourse-tagged corpus in the framework of rhetorical structure theory [J]. Current and New Directions Discourse and Dialogue, 2003: 85-112.
[6] Hernault H, Bollegala D, Ishizuka M. A sequential model for discourse segmentation[C]//Proceedings of the Computational Linguistics and Intelligent Text Processing. Springer Berlin Heidelberg, 2010: 315-326.
[7] Vanessa Wei Feng, Graeme Hirst. A linear-time bottom-up discourse parser with constraints and post-editing[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA, 2014:511-521.
[8] PDTB Research Group. The penn discourse treebank 2.0 annotation manual [R]. Philadelphia: University of Pennsylvania, 2008.
[9] Pitler E, Raghupathy M, Mehta H, et al. Easily identifiable discourse relations[C]//Proceedings of the International Conference on Computational Linguistics. 2008:87-90.
[10] Ziheng Lin, Min-Yen Kan, Hwee Tou Ng. Recognizing implicit discourse relations in the penn discourse treebank [C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Morristown: Association for Computational Linguistics, 2009: 343-351.
[11] 孙静,李艳翠,周围栋,等.汉语隐式篇章关系识别[J].北京大学学报(自然科学版),2014,(1):111-117.
[12] 张牧宇,宋原,秦兵,等.中文篇章级句间语义关系识别[J].中文信息学报,2014,27(6):51-57.
[13] 姬建辉,张牧宇,秦兵,等.中文篇章级句间关系自动分析[J].江西师范大学学报(自然科学报),2015,2(2):124-131.
[14] 涂眉,周玉,宗成庆.基于最大熵的汉语篇章结构自动分析方法[J].北京大学学报(自然科学版),2014,1(1):125-132.
[15] Fillmore, Charles J. Frame semantics [A]. In Linguistics in the Morning Calm, the Linguistic Society of Korea, Seoul: Hanshin,1982:111-137.
[16] 李茹.汉语句子框架语义结构分析技术研究[D].山西大学博士学位论文. 2012.
[17] 郝晓燕,刘伟,李茹等.汉语框架语义知识库及软件描述体系[J].中文信息学报, 2007, 21(5): 96-100.
[18] 黄伯荣,廖序东.现代汉语[M].北京: 高等教育出版社.2011.
[19] Abney S, Flickinger D, Gdaniec C, et al. Procedure for quantitatively comparing the syntactic coverage of English grammars[C]//Proceedings of the Workshop on Speech & Natural Language, 1991:306-311.

基金

国家自然科学基金(61373082);山西省科技基础条件平台建设项目(2014091004-0103);山西省回国留学人员科研资助项目(2013-015);国家863计划项目(2015AA015407);中国民航大学信息安全测评中心开放课题基金项目(CACC-ISECCA-201402)
PDF(6179 KB)

Accesses

Citation

Detail

段落导航
相关文章

/