篇章衔接性分析是理解篇章的基础,汉语和英语在指代、连接和省略等主要衔接方式上存在差异。该文分别给出子句、连接词、指代和省略的汉英篇章衔接对齐标注策略,创建了规模为200个对齐文档的语料库资源,对标注语料进行质量评估并讨论了标注中的难点问题及解决方法。语料库中的子句、连接词和指代标注一致率分别为0.909、0.876和0.920。在该文构建的语料库上分别进行子句切分和连接词识别实验,结果表明,该文语料标注策略切实可行,标注质量满足实际需要。
Abstract
Discourse cohesion analysis plays a critical role in discourse understanding, and there exist differences in cohesion between English and Chinese. First, we explore proper strategies in annotating discourse cohesion, including clause, conjunction, reference and ellipsis. Then, we create 200 documents corpus which contains the information of cohesion alignment. Finally, this paper evaluates the corpus, discusses the problems and solutions in the annotation. The annotation consistency for clauses, connectives and reference in the corpus reaches 0.909, 0.876 and 0.920, respectively. The clause segmentation and connective recognition results show that the quality of tagged corpus meets the actual needs.
关键词
篇章衔接 /
对齐语料标注 /
指代 /
省略 /
连接
{{custom_keyword}} /
Key words
discourse cohesions /
alignment corpus annotation /
reference /
ellipsis /
conjunction
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Halliday M A K, Hasan R. Cohesion in English [M]. London: Longman,1976.
[2] Werth P. Focus, coherence and emphasis[M]. London: Routledge Kegan and Paul,1984.
[3] Cook G. Discourse[M]. London: Oxford Univ Press,1989.
[4] 胡壮麟.语篇的衔接与连贯[M].上海:上海外语教育出版社,1994.
[5] 周利芳.汉语“提及”类衔接成分的用法及其辨析[J].华文教学与研究,2018(3): 61-69.
[6] 曹继阳.汉语口语语篇衔接手段与衔接成分: 基于经典情景喜剧《我爱我家》的研究[J].语言文字应用,2019(2): 142.
[7] 奚雪峰,孙庆英,周国栋.面向意图性的篇章话题结构分析研究与展望[J].计算机学报,2019,42(12): 2769-2794.
[8] 朱永生,郑立信,苗兴伟.英汉语篇衔接手段对比研究[M].上海: 上海外语教育出版社,2001.
[9] 钟书能.话题链在汉英篇章翻译中的统摄作用[J].外语教学理论与实践,2016(1): 85-91,58.
[10] 张献丽.略论汉英翻译中的衔接性[J].牡丹江大学学报, 2017,26(10): 146-147.
[11] 张易男,李燕鸿.汉英“照应”衔接对比与翻译研究: 以《2018年政府工作报告》及其英译版为例[J].英语教师, 2019,19(9): 134-138.
[12] Kong F, Zhou G D. A tree kernel-based unified framework for Chinese zero anaphora resolution[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2010: 882-891.
[13] 周强.汉语句法树库标注体系[J].中文信息学报,2004,18(4): 1-8.
[14] 张牧宇,秦兵,刘挺.中文篇章级关系体系及类型标注[J].中文信息学报, 2014, 28(2): 28-36.
[15] Li Y C, Feng W H, Sun J, et al. Building Chinese discourse corpus with connective-driven dependency tree structure[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014: 2105-2114.
[16] 冯文贺.汉英篇章结构平行语料库的对齐标注研究[J]. 中文信息学报,2013,27(06): 158-164.
[17] 冯文贺,李艳翠,任函,等.汉英篇章结构平行语料库的对齐标注评估[J].中文信息学报,2017,31 (03): 86-93.[18] 杨传鸣.《红楼梦》及其英译本语篇衔接对比[D].哈尔滨: 黑龙江大学硕士学位论文,2008.
[19] 李艳翠,冯文贺,周国栋,等.基于逗号的汉语子句识别研究[J].北京大学学报(自然科学版),2013,49(1): 7-14.
[20] 李艳翠,孙静,周国栋.汉语篇章连接词识别与分类[J].北京大学学报(自然科学版),2015,51(2): 307-314.
[21] Li Y C, Lai C X, Feng J K, et al. Chinese and English elementary discourse units segmentation based on Bi-LSTM-CRF model[C]//Proceedings of the 19th Chinese National Conference on Computational Linguistics.2020: 1068-1078.
[22] 冯洪玉,李艳翠,冯文贺.基于汉英平行语料库的英文显式篇章关系识别[J].河南科技学院学报(自然科学版),2019,47(5): 55-62.
[23] Tu M, Zhou Y, Zong C Q. Enhancing grammatical cohesion: generating transitional expressions for SMT[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014: 850-860.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61502149);河南省科技计划项目(182102210048);广东省基础与应用基础研究基金项目(2020A1515011056)
{{custom_fund}}