篇章分析技术综述

徐 凡,朱巧明,周国栋

PDF(1869 KB)
PDF(1869 KB)
中文信息学报 ›› 2013, Vol. 27 ›› Issue (3) : 20-33.
综述

篇章分析技术综述

  • 徐 凡,朱巧明,周国栋
作者信息 +

Survey of Discourse Analysis Methods

  • XU Fan, ZHU Qiaoming, ZHOU Guodong
Author information +
History +

摘要

篇章作为词和句子之后的一种文本分析粒度在自然语言理解和自然语言生成中起到至关重要的作用。该文从计算语言学角度出发,对中英文篇章分析技术的研究现状进行了综述。介绍了中英文篇章分析技术在自然语言处理中的应用,并分别从篇章理论、篇章语料库及评测、篇章分析器的自动构建等方面详细阐述了中英文篇章分析技术。最后归纳出篇章分析技术后续研究的几个方向。

Abstract

Discourse, a kind of text analysis granularity beyond word and sentence, plays a crucial role in natural language understanding and generation. This paper surveys the state-of-the-art researches in Chinese and English discourse analysis under the perspective of computational linguistics, including the applications of Chinese and English discourse analysis, the process of constructing a full Chinese and English discourse parser according to different discourse theories, discourse corpus and evaluation, as well as algorithms and detailed implementation. Also, this paper outlines several directions for further researches on discourse analysis.
Key wordsdiscourse; discourse analysis; corpus; evaluation

关键词

篇章 / 篇章分析 / 语料库 / 评测

Key words

discourse / discourse analysis / corpus / evaluation
 
/   /   /
 
/   /   /
 
/   /  

引用本文

导出引用
徐 凡,朱巧明,周国栋. 篇章分析技术综述. 中文信息学报. 2013, 27(3): 20-33
XU Fan, ZHU Qiaoming, ZHOU Guodong. Survey of Discourse Analysis Methods. Journal of Chinese Information Processing. 2013, 27(3): 20-33

参考文献

[1] Grosz B J, Joshi A K, Weinstein S. Centering:A Framework for Modeling the Local Coherence of Discourse[J]. Computational Linguistics, 1995, 21(2): 203-225.
[2] Mann W C, Thompson S A. Rhetorical Structure Theory: Toward a functional theory of text organization[J]. Text, 1988, 8(3): 243-281.
[3] Webber B. D-LTAG: extending lexicalized TAG to discourse[J]. Cognitive Science, 2004, 28(5): 751-779.
[4] Jerry R H. On the coherence and structure of discourse[R]. USA: Stanford CA, 1985.
[5] Wolf F, Gibson E. Representing discourse coherence: a corpus-based analysis[C]//Proceedings of the 20th International Conference on Computational Linguistics.Morristown: Association for Computational Linguistics, 2004: 134-140.
[6] 吴为章, 田小琳.汉语句群[M].北京: 商务印书馆, 2000: 1-246.
[7] 邢福义.汉语复句研究[M].北京: 商务印书馆, 2001: 1-693.
[8] Meyer T. Disambiguating Temporal-Contrastive Discourse Connectives for Machine Translation[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies.Morristown: Association for Computational Linguiscs, 2011: 46-51.
[9] Meyer T, Belis A P. Multilingual Annotation and Disambiguation of Discourse Connectives for Machine Translation[C]//Proceedings of the 10th Annual Sigdial Meeting on Discourse and Dialogue.Morristown: Association for Computational Linguiscs, 2011: 194-203.
[10] Nagard R L, Koehn P. Aiding Pronoun Translation with Co-Reference Resolution[C]//Proceedings of Workshop on SMT and MetricsMATR.Morristown: Association for Computational Linguiscs, 2010: 252-261.
[11] Haenelt K. Towards a Quality Improvement in Machine Translation: Modelling Discourse Structure and Including Discourse Development in the Determination of Translation Equivalents[C]//Proceedings of the 4th International Conference on Theoretical and Methodological Issues in Machine Translation.Morristown: Association for Computational Linguiscs, 1992: 205-212.
[12] Mitkov R. How could rhetorical relations be used in machine translation (and at least two open questions)?[C]//Proceedings of ACL Workshop on Intentionality and Structure in Discourse Relations.Morristown: Association for Computational Linguiscs, 1993: 86-89.
[13] 刘挺, 王开铸.基于篇章多级依存结构的自动文摘研究[J].计算机研究与发展, 1999, 36(4): 479-488.
[14] 王建波, 王开铸.自然语言篇章理解及基于理解的自动文摘研究[J].中文信息学报, 1992, 6(2): 1-7.
[15] 王建波, 杜春玲, 王开铸.基于篇章理解的自动文摘研究[J].中文信息学报, 1995, 9(3): 33-42.
[16] Chai J, Jing R. Discourse Structure for Context Question Answering[C]//Proceedings of the Workshop on Pragmatics of Question Answering at the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics.Morristown: Association for Computational Linguistics, 2004: 23-30.
[17] Sun M, Chai J Y. Discourse processing for context question answering based on linguistic knowledge[J]. Knowledge-based Systems, 2007, 20(6): 511-526.
[18] 张志昌,张宇,刘挺,等.基于话题和修辞识别的阅读理解Why型问题回答[J].计算机研究与发展, 2011, 48(2): 216-223.
[19] 吴华, 黄泰翼.问答篇章生成系统中的用户模型和文本规划[J].中文信息学报, 2001, 15(4): 28-34.
[20] 崔耀, 陈永明.一个实验性的汉语篇章理解系统[J].中文信息学报, 1994, 8(3): 24-34.
[21] Huttunen S, Vihavainen A, Etter P V, et al. Relevance Prediction in Information Extraction using Discourse and Lexical Features[C]//Proceedings of the 18th Nordic Conference of Computational Linguistics.Latvia, 2011: 114-121.
[22] Cimiano P, Reyle U, Saric J. Ontology-driven discourse analysis for information extraction[J]. Data & Knowledge Engineering, 2005(55): 59-83.
[23] 唐旭日,陈小荷,许超,等.基于篇章的中文地名识别研究[J].中文信息学报, 2010, 24(2): 24-32.
[24] 袁毓林.用逻辑和篇章知识来约束模板匹配——逻辑结构和篇章结构知识在信息抽取中的运用[J].中文信息学报, 2004, 19(4): 39-45.
[25] Wang D Y, Luk R W P, Wong K F, et al. An Information Retrieval Approach Based on Discourse Type[C]//Proceedings of the 11th International Conference on Applications of Natural Language to Information System. Springer.2006: 197-202.
[26] Morato J, Llorens J, Genova G, et al. Experiments in discourse analysis impact on information classification and retrieval algorithms[J]. Information Processing and Management, 2003, 39(6): 825-851.
[27] Mohler M, Bunescu R, Mihalcea R. Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Morristown: Association for Computational Linguistics, 2011: 752-762.
[28] Yannakoudakis H, Briscoe T, Medlock B. A New Dataset and Method for Automatically Grading ESOL Texts[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics.Morristown: Association for Computational Linguistics, 2011: 180-189.
[29] Somasundaran S, Namata G, Wiebe J, et al. Supervised and Unsupervised Methods in Employing Discourse Relations for Improving Opinion Polarity Classification[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing.Morristown: Association for Computational Linguistics, 2009: 170-179.
[30] Escalante H J, Solorio T. Local Histograms of Character N-grams for Authorship Attribution[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics.Morristown: Association for Computational Linguistics, 2011: 288-298.
[31] Prasad R, Miltsakaki E, Dinesh N, et al. The Penn Discourse Treebank 2.0 Annotation Manual[R].USA: University of Pennsylvania, 2008.
[32] Carlson L, Marcu D, Okurowski M E. Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory[C]//Proceedings of the Annual Sigdial Meeting on Discourse and Dialogue, Morristown: Association for Computational Linguiscs, 2001: 30-39.
[33] Forbes K, Miltsakaki E, Prasad R, et al. D-LTAG System: Discourse Parsing with a Lexicalized Tree-adjoining Grammar[J]. Journal of Logic, Language and Information, 2001, 12(3): 261-279.
[34] Joshi A K, Schabes Y. Tree-Adjoing Grammar and Lexicalized Grammars[R]. USA: University of Pennsylvania, 1991.
[35] Taboada M, Mann W C.Applications of Rhetorical Structure Theory[J].Discourse Studies, 2006, 8(4): 567-588.
[36] 卫真道(著),徐赳赳(译).篇章语言学[M].北京: 中国社会科学出版社, 2002: 1-171.
[37] Lin ZH, Ng H T, Kan M Y. A PDTB-styled end-to-end discourse parser[R]. Singapore: National University of Singapore, 2010.
[38] Pitler E, Nenkova A. Using Syntax to Disambiguate Explicit Discourse Connectives in Text[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Morristown: Association for Computational Linguistics, 2009: 13-16.
[39] Pitler E, Raghupathy M, Mehta H, et al. Easily Identifiable Discourse Relations[C]//Proceedings of the 22nd International Conference on Computational Linguistics.Morristown: Association for Computational Linguistics, 2008: 85-88.
[40] Wellner B, Pustejovsky J. Automatically Identifying the Arguments of Discourse Connectives[C]//Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning.Morristown: Association for Computational Linguistics, 2007: 92-101.
[41] Prasad R, Joshi A, Webber B. Realization of Discourse Relations by Other Means: Alternative Lexicalizations[C]//Proceedings of the 23rd International Conference on Computational Linguistics. Morristown: Association for Computational Linguistics, 2010: 1023-1031.
[42] Wang WT, Su J, Tan C L. Kernel Based Discourse Relation Recognition with Temporal Ordering Information[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Morristown: Association for Computational Linguistics, 2010: 710-719.
[43] Zhou ZM, Xu Y, Niu ZY, et al. Predicting Discourse Connectives for Implicit Discourse Relation Recognition[C]//Proceedings of the 23rd International Conference on Computational Linguistics. Morristown: Association for Computational Linguistics, 2010: 1507-1514.
[44] Zhou ZM, Lan M, Niu ZY, et al. The Effects of Discourse Connectives Prediction on Implicit Discourse Relation Recognition[C]//Proceedings of the 9th Annual Sigdial Meeting on Discourse and Dialogue.Morristown: Association for Computational Linguiscs, 2010: 139-146.
[45] Lin ZH, Kan M Y, Ng H T. Recognizing Implicit Discourse Relations in the Penn Discourse Treebank[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing.Morristown: Association for Computational Linguistics, 2009: 343-351.
[46] Pitler E, Louis A, Nenkova A. Automatic Sense Prediction for Implicit Discourse Relations in Text[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Morristown: Association for Computational Linguistics, 2009: 683-691.
[47] Hernault H, Bollegala D, Ishizuka M. A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations using Feature Vector Extension[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing.Morristown: Association for Computational Linguistics, 2010: 399-409.
[48] Tofiloski M, Brooke J, Taboada M. A Syntactic and Lexical-Based Discourse Segmenter[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Morristown: Association for Computational Linguistics, 2009: 77-80.
[49] Soricut R, March D. Sentence Level Discourse Parsing Using Syntactic and Lexical Information[C]//Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. Morristown: Association for Computational Linguistics, 2003: 149-156.
[50] LeThanh H, Abeysinghe G, Huyck C. Generating Discourse Structures for Written Texts[C]//Proceedings of the 20th International Conference on Computational Linguistics.Morristown: Association for Computational Linguistics, 2004: 329-335.
[51] Hernault H, Bollegala D, Ishizuka M.A Sequential Model for Discourse Segmentation[C]//Proceedings of the 11th International Conference on Intelligent Text Processing and Computational Linguistics. Morristown: Association for Computational Linguistics, 2010: 315-326.
[52] DuVerle D A, Prendinger H. A Novel Discourse Parser Based on Support Vector Machine Classification[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Morristown: Association for Computational Linguistics, 2009: 665-673.
[53] 田然.近二十年汉语语篇研究述评[J].汉语学习, 2005, 1: 51-55.
[54] 郑贵友.中文篇章分析的兴起与发展[J].汉语学习, 2005, 5: 40-48.
[55] 聂仁发.汉语语篇研究回顾与展望[J].宁波大学学报(人文科学版),2009, 22(3): 40-45.
[56] 陈莉萍.修辞结构理论与句群研究[J].苏州大学学报(哲学社会科学版),2008, 4: 118-121.
[57] 徐赳赳, Webster J J.复句研究与修辞结构理论[J].外语教学与研究, 1999, 4: 16-22.
[58] 曹政.句群初探[M].杭州: 浙江教育出版社, 1984: 1-130.
[59] 张志公.张志公文集①汉语语法[M].上海: 上海教育出版社, 1962: 1-651.
[60] 吕叔湘.中国文法要略[M].北京: 商务印书馆, 1956: 1-463.
[61] 王力.中国现代语法[M].北京: 商务印书馆, 1985: 1-402.
[62] 陆俭明.现代汉语句法[M].北京: 商务印书馆, 1993: 1-235.
[63] 黎锦熙. 新著国语文法[M].湖南: 湖南教育出版社, 2007: 1-347.
[64] 张益民,陆汝占,沈李斌.一种混合型的中文篇章结构自动分析方法[J].软件学报, 2000, 11(11): 1527-1533.
[65] 张威,周昌乐.汉语语篇理解中元指代消解初步[J].软件学报, 2002, 13(4): 732-738.
[66] 孔芳.指代消解关键问题研究[D].苏州: 苏州大学, 2009.
[67] 王跃龙,姬东鸿.汉语树库综述[J].当代语言学, 2009, 11(1): 47-55.
[68] 周强.汉语句法树库标注体系[J].中文信息学报, 2004, 18(3): 1-8.
[69] 乐明.中文篇章修辞结构的标注研究[J].中文信息学报, 2008, 22(4): 19-23.
[70] Xue Nianwen.Annotating Discourse Connectives in the Chinese Treebank[C]//Proceedings of CorpusAnno.Morristown: Association for Computational Linguistics, 2005: 84-91.
[71] Hen-Hsen Huang, Hsin-His Chen.Chinese Discourse Relation Recognition[C]//Proceedings of the 5th International Joint Conference on Natural Language.
[72] Processing. Asian Federation of Natural Language Processing.2011: 1442-1446.


基金

国家自然科学基金资助项目(61070123,61003155);江苏省自然科学基金资助项目(BK2011282);江苏省高校自然科学基金重大研究资助项目(11KIJ520003);教育部科技发展中心网络时代的科技论文快速共享专项研究资助项目;江苏省普通高校研究生科研创新计划资助项目(CXZZ11_0101)
PDF(1869 KB)

926

Accesses

0

Citation

Detail

段落导航
相关文章

/