汉英篇章结构平行语料库的对齐标注评估

冯文贺;李艳翠;任 函;周国栋

PDF(1042 KB)
PDF(1042 KB)
中文信息学报 ›› 2017, Vol. 31 ›› Issue (3) : 86-93.
语言资源建设

汉英篇章结构平行语料库的对齐标注评估

  • 冯文贺1;2 ;李艳翠3;任 函1;周国栋4
作者信息 +

Evaluation for Alignment Annotation of Chinese-English Discourse Treebank

  • FENG Wenhe1;2 ;LI Yancui3 ;REN Han1; ZHOU Guodong4
Author information +
History +

摘要

汉英篇章结构平行语料库是为汉英翻译文本标注对齐篇章结构信息的语料库,对齐标注是其核心工作,基本原则是“结构对齐、关系对齐”。该文基于所开发的对齐标注平台,进行人工对齐标注实验,提出切分对齐、结构对齐、关系对齐、连接词对齐、关系角色与中心对齐等对齐标注任务的评估方法,并给出评估分析。实验表明,对齐标注是构建汉英篇章结构平行语料库的合理、有效工作方式。

Abstract

Chinese-English discourse treebank (CEDT) is a parallel corpus annotated with alignment discourse structure information for Chinese and English. Its core task is alignment annotation supervised by the principle of structure and relation alignment. With the corresponding annotation platform, we manually annotate the corpus, propose the evaluation methods for the alignment annotation and give the evaluation analysis, including segmentation, structure, relation, connective, relation role and center alignment. Experimental results show that the alignment annotation strategy is a feasible and efficient method of building CEDT.

关键词

篇章结构 / 平行语料库 / 对齐标注 / 结构对齐 / 对齐评估

Key words

discourse structure / parallel corpus / alignment annotation / structural alignment / alignment evaluation

引用本文

导出引用
冯文贺;李艳翠;任 函;周国栋. 汉英篇章结构平行语料库的对齐标注评估. 中文信息学报. 2017, 31(3): 86-93
FENG Wenhe;LI Yancui ;REN Han; ZHOU Guodong. Evaluation for Alignment Annotation of Chinese-English Discourse Treebank. Journal of Chinese Information Processing. 2017, 31(3): 86-93

参考文献

[1] 冯文贺.汉英篇章结构平行语料库的对齐标注研究[J].中文信息学报,2013(6): 158-165.
[2] 柏晓静, 常宝宝, 詹卫东, 等. 构建大规模的汉英双语平行语料库[C]. 黄河燕. 机器翻译研究进展:2002年全国机器翻译研讨会论文集.北京:电子工业出版社,2002.
[3] 王克非. 双语对应语料库: 研制与应用[M].北京: 外语教学与研究出版社,2004.
[4] 刘泽权,田璐,刘超朋.《红楼梦》中英文平行语料库的创建[J]. 当代语言学, 2008, 10(4): 329-339.
[5] Carlson L, Marcu D, Okurowski M E. Building a discourse-tagged corpus in the framework of rhetorical structure theory [M]. Jan van Kuppevelt, Ronnie W.Smith (eds.),Current and New Directions in Discourse and Dialogue, Kluwer Academic Publishers,2003: 85-112.
[6] Prasad R, Dinesh N, Lee A,et al. The Penn Discourse Treebank 2.0[C]//Proceedings of the 6th International Conference on Language Resources and Evaluation,2008.
[7] 乐明. 汉语篇章修辞结构的标注研究[J]. 中文信息学报, 2008, 22(4): 19-23.
[8] ZhouY, Xue N. PDTB-style Discourse Annotation of Chinese Text[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, 2012: 69-77.
[9] 张牧宇,宋原,秦兵,等.中文篇章级句间语义关系体系及标注[J].中文信息学报,2014,(2): 28-36.
[10] Li Y, Feng W, Sun J, et al. Building Chinese discourse corpus with connective-driven dependency tree structure[C]//Proceedings of EMNLP 2014, 2014: 2105-2114.
[11] Mann W C, Thompson S A. Rhetorical structure theory: toward a functional theory of text organization[J]. Text, 1988, 8(3): 243-281.
[12] 李艳翠,冯文贺,周固栋,等. 基于逗号的汉语子句识别研究[J]. 北京大学学报(自然科学版), 2013,49(1): 7-14.
[13] Marcu D,Amorrortu E,Romera M.Experiments in constructing a corpus of discourse trees[C]//Proceedings of the ACL Workshop on Standards and Tools for Discourse Tagging,1999: 48-57.
[14] 冯文贺,李艳翠,周国栋.汉英篇章结构平行语料库对齐标注的难点与对策[C]. 第十届全国机器翻译研讨会,2014: 25-35.

基金

教育部人文社科项目(13YJC740022、15YJC740021);河南高校哲社基础研究重大项目(2015-JCZD-022);中国博士后基金(2013M540594);国家自然科学基金(61402341,61502149,61273320);广东外语外贸大学语言工程与计算实验室2016年招标课题(LEC2016ZBKT001,LEC2016ZBKT002)
PDF(1042 KB)

Accesses

Citation

Detail

段落导航
相关文章

/