篇章关系分为显式和隐式两种。显式关系的显著特征是篇章的基本单元之间存在显式连接词。针对汉语显式篇章关系,构建了包括汉语连接词识别和篇章关系分类的显式篇章关系分析平台。该文选取汉语宾州树库(Chinese Penn Treebank, CTB)中的500篇文本进行了汉语显式篇章关系标注;结合连接词的中心词,采用最大熵分类器构建了汉语连接词识别模块,其性能F1值达到了66.79%;基于连接词及其词性等上下文特征,构建了篇章关系分类器,其在最顶层4大类语义关系上的分类性能的F1值为91.92%。
Abstract
Discourse relations can be expressed explicitly or implicitly. This paper focuses on explicit discourse relations that are explicitly signaled by discourse connectives. We propose an explicit discourse relation parsing platform, containing connective identification and sense classification. Using 500 texts from the Chinese Discourse TreeBank corpus (CTB), we annotate an explicit discourse relations corpus. Considering headwords of connectives, we construct a connective identifier using maximum entropy based on this corpus, which reports F1 of 66.79%. And a sense classifier based on the context of connective itself is proposed and reports F1 of 91.92%.
关键词
连接词识别 /
语义关系分类 /
最大熵分类器
{{custom_keyword}} /
Key words
connectives identification /
sense classification /
maximum entropy classifier
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Rashmi Prasad, Nikhil Dinesh, Alan Lee, et al. The Penn Discourse Treebank 2.0[C]//Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008),2008: 2961-2968.
[2] Alsaif A, Markert K. Modelling discourse relations for Arabic[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011: 736-747.
[3] Xue N. Annotating discourse connectives in the chinese treebank[C]//Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky. Association for Computational Linguistics, 2005: 84-91.
[4] Berger A L, Pietra V J D, Pietra S A D. A maximum entropy approach to natural language processing[J]. Computational linguistics, 1996, 22(1): 39-71.
[5] PDTB-Group. The Penn Discourse Treebank 2.0 Annotation Manual[OL]. The PDTB Research Group.2007.
[6] Emily Pitler, Ani Nenkova. Using syntax to disambiguate explicit discourse connectives in text[C]//Proceedings of the ACL-IJCNLP Conference Short Papers, Singapore, 2009.
[7] Ziheng Lin, Hwee Tou Ng, Min-Yen Kan. A PDTB-styled end-to-end discourse parser[J]. Natural Language Engineering. 2012,1(1):1-35.
[8] Ramesh Balaji, Hong Yu. Identifying discourse connectives in biomedical text[C]//Proceedings of AMIA Ann Symp Proc 2010:657-661.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然基金(61333018);国家863(2012AA011102);国家自然基金(61273320)
{{custom_fund}}