Abstract:Automatic discourse processing is considered as one of the most challenging NLP tasks which is helpful to many downstream NLP tasks, such as question answering, automatic summary and natural language generation. Recently, the large scale discourse corpus PDTB is made available, which provides a common platform for discourse researchers. On the basis of PDTB corpus, the paper proposes an end-to-end explicit discourse parser with conditional random fields. The parser consists of three components joined in a sequential pipeline architecture, which includes connective classifier, explicit relation classifier and relation argument extractor. We report the performance on each component, and, from error-cascading perspectives, we analyses the parsers overall performance in detail.
[1] PDTB-Group. The Penn Discourse Treebank 2.0 Annotation Manual[OL]. The PDTB Research Group, 2007. [2] Bonnie Webber. D-LTAG: Extending lexicalized TAG to discourse[J]. Cognitive Science, 2004,28(5):751-779. [3] Mitchell P Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz. Building a Large Annotated Corpus of English: the Penn Treebank[J]. Computational Linguistics, 1993,19(2):313-330. [4] Emily Pitler, Ani Nenkova. Using syntax to disambiguate explicit discourse connectives in text[C]//Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, Singapore,2009. [5] Ziheng Lin, Hwee Tou Ng, Min-Yen Kan. A PDTB-styled end-to-end discourse parser[C]//Proceedings of the Natural Language Engineering,2012. [6] Nikhil Dinesh, Alan Lee, Eleni Miltsakaki, et al. Attribution and the (non)-alignment of syntactic and discourse arguments of connectives[C]//Proceedings of the ACL Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, Ann Arbor, MI, USA,2005. [7] Ben Wellner, James Pustejovsky. Automatically identifying the arguments of discourse connectives[C]//Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 2007: 92-101. [8] Sucheta Ghosh, Richard Johansson, Giuseppe Riccardi, et al. Shallow discourse parsing with conditional random fields[C]//Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP 2011), 2011:1071-1079. [9] R Prasad, S McRoy, N Frid, et al. The biomedical discourse relation bank[OL]. BMC Bioinformatics, 2011. [10] Ramesh Balaji, Hong Yu. Identifying discourse connectives in biomedical text[C]//Proceedings of the AMIA Ann Symp Proc, 2010. [11] Rashmi Prasad, Nikhil Dinesh, Alan Lee, et al. The Penn Discourse Treebank 2.0[C]//Proceedings of the 6th International Conference on Language Resources and Evaluation,2008.