Abstract:The research on discourse relation is aimed at inferring the inter-sentential semantic relationship which occurs in the same discourse. This relation plays an important role in discourse content understanding and structure analyzing, becoming research focus in the field of discourse analysis. In this paper, we introduce the corpus and background, annotation and evaluation system as well as in this field based three corpora: Rhetorical Structure Theory Discourse Treebank (RSTDT), Penn Discourse Treebank (PDTB) and HIT Chinese Discourse Treebank (HIT-CDTB). Finally, through analyzing current work, we summarize the main difficulty and challenge in recognizing discourse relation especially implicit discourse relation.
[1] E Pitler, A Nenkova. Revisiting readability: A unified framework for predicting text quality[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2008:186-195. [2] Z Lin, H T Ng, M Y Kan. Automatically Evaluating Text Coherence Using Discourse Relations[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), 2011: 997-1006. [3] M Riaz, R Girju. Another look at causality: Discovering scenario-specific contingency relationships with no supervision[C]//Proceedings of the 4th International Conference on Semantic Computing (ICSC), 2010:361-368. [4] Q X Do, Y S Chan, D Roth. Minimally supervised event causality identification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2011:294-303. [5] L Zhou, B Li, W Gao, Z Wei, et al. Unsupervised discovery of discourse relations for eliminating intra-sentence polarity ambiguities[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2011:162-171. [6] 王继成,武港山. 一种篇章结构指导的中文Web文档自动摘要方法[J].计算机研究与发展, 2003, 40(3):398-405. [7] D Y Xiong, D Yang, M Zhang, et al. Lexical Chain Based Cohesion Models for Document-Level Statistical Machine Translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2013:1563-1573. [8] M P Marcus, M A Marcinkiewicz, B Santorini. Building a large annotated corpus of English: The Penn Treebank[J].Computational linguistics, 1993, 19(2): 313-330. [9] W C Mann, S A Thompson. Rhetorical Structure Theory: Toward a functional theory of text organization[J].Text, 1988, 8(3):243-281. [10] L Carlson, D Marcu, M E Okurowski. Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory[C]//Proceedings of 2nd SIGdial Workshop on Discourse and Dialogue, 2001:1-10. [11] R Prasad, N Dinesh, A Lee, et al. The Penn Discourse TreeBank 2.0[C]//Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC), 2008:2961-2968. [12] R Prasad, A Joshi, B Webber. Exploiting scope for shallow discourse parsing[C]//Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), 2010:2076-2083. [13] D A DuVerle, H Prendinger. A novel discourse parser based on support vector machine classification[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2009:665-673. [14] E Miltsakaki, N Dinesh, R Prasad, et al. Experiments on sense annotations and sense disambiguation of discourse connectives[C]//Proceedings of the 4th Workshop on Treebanks and Linguistic Theories (TLT), 2005:1-12. [15] Z Lin, H T Ng, M Y Kan. A PDTB-Styled End-to-End Discourse Parser[J]. Natural Language Engineering, 2012, 1(1):1-35. [16] M Lan, Y Xu, Z Y Niu. Leveraging Systhetic Discourse Data via Multi-task Learning for Implicit Discourse Relation Recognition[C]//Proceeding of the 51st of ACL, 2013: 476-485. [17] W T Wang, J Su, C L Tan. Kernel Based Discourse Relation Recognition with Temporal Ordering Information[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), 2010:710-719. [18] W C Mann, S A Thompson, Rhetorical Structure[J], Theory: Toward a Functional Theory of Text Organization Text, 1988,8:(3): 243-281. [19] D Marcu. The rhetorical parsing of natural language texts[C]//Proceedings of the 8th conference on European chapter of the Association for Computational Linguistics (EACL), 1997:96-103. [20] R Soricut, D Marcu. Sentence level discourse parsing using syntactic and lexical information[C]//Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT-NAACL), 2003:149-156. [21] H LeThanh, G Abeysinghe, C Huyck. Generating discourse structures for written texts[C]//Proceedings of the 20th International Conference on Computational Linguistics, 2004:329-335. [22] HHernault, H Prendinger, A D Verle. HILDA: A discourse parser using support vector machine classification[J].Dialogue and Discourse, 2010, 1(3):1-33. [23] V W Feng, G Hirst. Text-level Discourse Parsing with Rich Linguistic Features[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL). 2012:60-68. [24] S Joty, G Carenini, R Ng. Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL). 2013:486-496. [25] V W Feng, G Hirst. A Linear-Time Bottom-Up Discourse Parser with Constraints and Post-Editing[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL). 2014:511-521. [26] 张益民, 陆汝占, 沈李斌. 一种混合型的汉语篇章结构自动分析方法[J].软件学报, 2000, 11(11): 1527-1533. [27] 涂眉, 周玉, 宗成庆. 基于最大熵的汉语篇章结构自动分析方法[J].北京大学学报:自然科学版, 2014,50(1):125-132. [28] E Pitler, A Nenkova. Using syntax to disambiguate explicit discourse connectives in text[C]//Proceedings of the ACL-IJCNLP Conference, 2009:13-16. [29] B Wellner, J Pustejovsky. Automatically identifying the arguments of discourse connectives[C]//Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 2007:92-101. [30] R Elwell, J Baldridge. Discourse connective argument identification with connective specific rankers[C]//Proceedings of the IEEE International Conference of Semantic Computing, 2008: 198-205. [31] E Pitler, M Raghupathy, H Mehta, et al. Easily identifiable discourse relations[R].Technical Reports (CIS), 2008:884. [32] E Pitler, A Louis, A Nenkova. Automatic Sense Prediction for Implicit Discourse Relations in Text[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (ACL-AFNLP), 2009:683-691. [33] Z Lin, M Y Kan, H T Ng. Recognizing Implicit Discourse Relations in the Penn Discourse Treebank[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2009:343-351. [34] Park J, Cardie C. Improving Implicit Discourse Relation Recognition Through Feature Set Optimization[C]//Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2012:108-112. [35] Biran O, McKeown K. Aggregated Word Pair Features for Implicit Discourse Relation Disambiguation[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), 2013:69-73. [36] Lan M, Xu Y, Niu Z Y. Leveraging Synthetic Discourse Data via Multi-task Learning for Implicit Discourse Relation Recognition[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013:476-485. [37] D Marcu, A Echihabi. An Unsupervised Approach to Recognizing Discourse Relations[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL), 2002: 368-375. [38] M Saito, K Yamamoto, S Sekine. Using Phrasal Patterns to Identify Discourse Relations[C]//Proceedings of the Human Language Technology Conference of the NAACL, 2006: 133-136. [39] Z M Zhou, Y Xu, Z Y Niu. Predicting Discourse Connectives for Implicit Discourse Relation Recognition[C]//Proceedings of the 23rd International Conference on Computational Linguistics (CL): Posters, 2010:1507-1514. [40] N Xue. Annotating discourse connectives in the Chinese Treebank[C]//Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky, 2005:84-91. [41] Y Zhou, N Xue. Pdtb-style discourse annotation of Chinese text[C]//Proceedings of the 50th Annual Meeting of the ACL, 2012:69-77. [42] H H Huang, H H Chen. Chinese Discourse Relation Recognition[C]//Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP), 2011:1142-1146. [43] J Li, M Carpuat, A Nenkova. Cross-lingual Discourse Relation Analysis A corpus study and a semi-supervised classification system[C]//Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers (COLING), 2014:577-587. [44] D Zeyrek, B Webber. A Discourse Resource for Turkish: Annotating Discourse Connectives in the METU Corpus[C]//Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP), 2008. [45] U Oza, R Prasad, S Kolachina, et al. Experiments with Annotating Discourse Relations in the Hindi Discourse Relation Bank[C]//Proceedings of the 7th International Conference on Natural Language Processing (ICON), 2009. [46] A Alsaif, K Markert. Modelling discourse relations for Arabic[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2011:736-747.