汉语篇章级小句关系的标注体系

吴云芳,徐艺峰,王恺然

PDF(1458 KB)
PDF(1458 KB)
中文信息学报 ›› 2015, Vol. 29 ›› Issue (3) : 71-81.
语编标注与推理

汉语篇章级小句关系的标注体系

  • 吴云芳,徐艺峰,王恺然
作者信息 +

Intra-Sentence Relationship Annotation Scheme for Chinese Discourse Analysis

  • WU Yunfang, XU Yifeng, WANG Kairan
Author information +
History +

摘要

句际关系自动分析属于篇章语义学研究的范畴,虽然英语句际关系的研究已有大量工作,但汉语句际关系的自动分析还只是刚刚起步。该文在RST理论框架下,结合汉语特点,提出了完整的汉语篇章级小句关系标注体系。将汉语话题和逻辑关系置于同一个框架下进行描述,将小句关系划分为事件附属关系和事件逻辑关系两大类。逻辑关系又包括6个中类、15个小类。目前已在人民日报语料上完成了8000个句子的小句关系标注。抽取出其中1000个句子检测了双盲标注的一致性,揭示了汉语意合性语言小句关系标注的困难;并基于标注数据对关系类型进行了定量分析,指示了汉语句际关系自动分析将面临的重点和难点。

Abstract

Automatic discourse analysis has aroused strong interests in the recent years. Compared to the bulks of work on English discourse analysis, much less work has been done in Chinese discourse parsing. A non-negligible reason is that there is no well-annotated Chinese discourse corpus publically available. Under the RST-framework, this paper proposes an intra-sentence relationship annotation scheme for Chinese discourse analysis. We consider both the topic and the logic aspect, discriminating the attachment relationship and logic relationship in Chinese intra-sentence relationship. The logic relationship consists of 6 types and 15 subtypes. Up to now, we have annotated 8,000 sentences in the People Daily News. We check 1,000 sentences in a double-blind manner for the inter-annotator agreement, which may give a hint for the difficulties in this task. Based on the annotated data, we give some statistics analysis and demonstrate some challenges for Chinese automatic discourse analysis.

关键词

句际关系 / 小句关系 / 语料库标注

Key words

discourse relation / Intra-Sentence Relationship / corpus annotation

引用本文

导出引用
吴云芳,徐艺峰,王恺然. 汉语篇章级小句关系的标注体系. 中文信息学报. 2015, 29(3): 71-81
WU Yunfang, XU Yifeng, WANG Kairan. Intra-Sentence Relationship Annotation Scheme for Chinese Discourse Analysis. Journal of Chinese Information Processing. 2015, 29(3): 71-81

参考文献

[1] Mann W,Thompson S. Rhetorical structure theory: towards a functional theory of text organization [J], Text, 1998, 8(3): 243-281.
[2] 廖秋忠. 廖秋忠文集[M]. 北京: 北京语言学院出版社,1992.
[3] Louis A, Nenkova A. Automatic identification of general and specific sentences by leveraging discourse annotations[C]//Proceedings of EMNLP, 2011.
[4] Lin Z, Ng H, Kan M. Automatically evaluating text coherence using discourse relations[C]//Proceedings of ACL, 2011.
[5] Girju R. Automatic detection of causal relations for question answering[C]//Proceedings of ACL workshop on multilingual summarization and question answering, 2003.
[6] 张志昌,张宇,刘挺,李生. 基于话题和修辞识别的阅读理解Why型问题回答[J]. 计算机研究与发展,2011, 48(2):216-223.
[7] Wang F, Wu Y. Exploiting discourse relations for sentiment analysis[C]//Proceedings of COLING, 2012.
[8] Guzman F, Joty S, Marquez L, Nakov P. Using Discourse Structure Improves Machine Translation Evaluation[C]//Proceedings of ACL, 2014.
[9] Carlson L, Marcu D, Okurowski M, Okurowski M. Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory[C]//Proceedings of the 2nd SIGDIAL workshop on discourse and dialogue, 2001.
[10] Prasad R, Dinesh N, Lee A, et al. The Penn Discourse TreeBank 2.0[C]//Proceedings of LREC, 2008.
[11] 周强. 汉语句法树库标注体系[J]. 中文信息学报,2004,18(4):1-8.
[12] Xue N. Annotating discourse connectives in the Chinese Treebank[C]//Proceedings of the Workshop on Frontiers in Corpus Annotations, 2005.
[13] Zhou Y, Xue N. PDTB-style discourse annotation of Chinese text[C]//Proceedings of ACL, 2012.
[14] 邢福义,姚双云.汉语复句语料库的建设与利用[C]//载朱小健主编《中文信息处理的探索与实践》. 北京: 北京师范大学出版社, 2006.
[15] 乐明. 汉语篇章修辞结构的标注研究[J].中文信息学报, 2008,22(4): 19-23,42.
[16] Huang H, Chen H. Chinese discourse relation recognition[C]//Proceedings of IJCNLP, 2011.
[17] 张牧宇,秦兵,刘挺.汉语篇章级句间语义关系体系及标注[C]//Proceedings of CCIR 2012.
[18] 徐赳赳. 现代汉语篇章语言学[M]. 北京: 商务印书馆, 2010.
[19] 吕淑湘,朱德熙. 语法修辞讲话(第2版)[M]. 北京: 中国青年出版社, 1979.
[20] 胡裕树(主编). 现代汉语(重订本)[M]. 上海: 上海教育出版社, 1995.
[21] 邢福义. 汉语复句研究[M]. 北京: 商务印书馆, 2001.
[22] 吴为章,田小琳. 汉语句群[M]. 北京: 商务印书馆, 2000.
[23] Li N, Thompson A. Subject and topic: a new typology of languages[M]. Li N. (eds). Subject and Topic. New York: Academic Press.1976.
[24] 曹逢甫. 主题在汉语中的功能研究[M]. 北京: 语文出版社.1995.
[25] Prasad R, Miltsakaki E Dinesh, et al. The Penn discourse treebank 2.0 annotation manual[C]//Proceedings of IRCS Technical Reports Series, 2008.

基金

国家自然科学基金(61371129);国家重点基础研究发展计划(2014CB340504); 国家社科基金重大项目(12&ZD227);网络文化与数字传播北京市重点实验室开放课题(ICDD201402,ICDD201302)
PDF(1458 KB)

624

Accesses

0

Citation

Detail

段落导航
相关文章

/