1. School of Computer & Information Technology, Shanxi University, Taiyuan, Shanxi 030006, China;
2. Key laboratory of Computation Intelligence and Chinese Information Processing of Ministry of Education,
Shanxi University, Taiyuan, Shanxi 030006, China;
3. School of Foreign Languages, Shanxi University, Taiyuan, Shanxi 030006, China)
Abstract:Frame semantics is introduced to the research of Chinese discourse analysis which includes three subtasks discourse segmentation, discourse structure modeling and discourse relation recognition. First, the Chinese discourse coherence framework and a corresponding corpus is built based on frame semantics. Then two kinds of maximum entropy classifiers are applied to recognize the relation between discourse units and the class of discourse relation based on lexical features, dependency parser features, syntactic parser features, target features and frame sematic features. Finally, we use probability of the relation existence between discourse units to generate the discourse structure by greedy bottom-up method. Experimental results show that frame sematic can segment discourse units effectively and frame sematic feature can improve the performance of discourse structure construction and discourse relation recognition.
Key words Discourse units; Discourse Structure; Discourse Relation; Greedy Bottom-up Method
[1] Mann W C, Thompson S A. Rhetorical structure theory: A framework for the analysis of texts[J]. Iprapapers in Pragmatics, 1987,1: 79-105.
[2] Prasad R, Joshi A. A discourse-based approach to generating why-questions from texts[C]//Proceedings of the Workshop on the Question Generation Shared Task and Evaluation Challenge, Arlington, VA. 2008.
[3] Lin Z, Ng H T, Kan M Y. Automatically evaluating text coherence using discourse relations[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 2011: 997-1006.
[4] Mann W C, Thompson S A. Rhetorical structure theory: Toward a functional theory of text organization [J]. Text, 1988,8(3): 243-281.
[5] Carlson L, Marcu D, Okurowski M E. Building a discourse-tagged corpus in the framework of rhetorical structure theory [J]. Current and New Directions Discourse and Dialogue, 2003: 85-112.
[6] Hernault H, Bollegala D, Ishizuka M. A sequential model for discourse segmentation[C]//Proceedings of the Computational Linguistics and Intelligent Text Processing. Springer Berlin Heidelberg, 2010: 315-326.
[7] Vanessa Wei Feng, Graeme Hirst. A linear-time bottom-up discourse parser with constraints and post-editing[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA, 2014:511-521.
[8] PDTB Research Group. The penn discourse treebank 2.0 annotation manual [R]. Philadelphia: University of Pennsylvania, 2008.
[9] Pitler E, Raghupathy M, Mehta H, et al. Easily identifiable discourse relations[C]//Proceedings of the International Conference on Computational Linguistics. 2008:87-90.
[10] Ziheng Lin, Min-Yen Kan, Hwee Tou Ng. Recognizing implicit discourse relations in the penn discourse treebank [C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Morristown: Association for Computational Linguistics, 2009: 343-351.
[11] 孙静,李艳翠,周围栋,等.汉语隐式篇章关系识别[J].北京大学学报(自然科学版),2014,(1):111-117.
[12] 张牧宇,宋原,秦兵,等.中文篇章级句间语义关系识别[J].中文信息学报,2014,27(6):51-57.
[13] 姬建辉,张牧宇,秦兵,等.中文篇章级句间关系自动分析[J].江西师范大学学报(自然科学报),2015,2(2):124-131.
[14] 涂眉,周玉,宗成庆.基于最大熵的汉语篇章结构自动分析方法[J].北京大学学报(自然科学版),2014,1(1):125-132.
[15] Fillmore, Charles J. Frame semantics [A]. In Linguistics in the Morning Calm, the Linguistic Society of Korea, Seoul: Hanshin,1982:111-137.
[16] 李茹.汉语句子框架语义结构分析技术研究[D].山西大学博士学位论文. 2012.
[17] 郝晓燕,刘伟,李茹等.汉语框架语义知识库及软件描述体系[J].中文信息学报, 2007, 21(5): 96-100.
[18] 黄伯荣,廖序东.现代汉语[M].北京: 高等教育出版社.2011.
[19] Abney S, Flickinger D, Gdaniec C, et al. Procedure for quantitatively comparing the syntactic coverage of English grammars[C]//Proceedings of the Workshop on Speech & Natural Language, 1991:306-311.