汉英词语对齐规范

赵红梅,刘群,张瑞强,吕雅娟,隅田英一郎,吴翠玲

PDF(1472 KB)
PDF(1472 KB)
中文信息学报 ›› 2009, Vol. 23 ›› Issue (3) : 65-88.
综述

汉英词语对齐规范

  • 赵红梅1,刘群1,张瑞强2,吕雅娟1,隅田英一郎2,吴翠玲2
作者信息 +

A Guideline for Chinese-English Word Alignment

  • ZHAO Hongmei 1, LIU Qun1, ZHANG Ruiqiang2, LV Yajuan1, EiichiroSUMITA2,Chooi-Ling GOH2
Author information +
History +

摘要

该文介绍了一个新的汉英词语对齐规范。该规范以现有的LDC汉英词语对齐规范为基础,对其进行了较大的改进和扩展,特别是提出了一种全新的对齐标注方法 —— 将词语对齐区分为真对齐和伪对齐,真对齐又分为强对齐和弱对齐。这种细化的标注方法能够更好地刻画词语对齐的特点。该规范已经实际应用于大规模的人工词语对齐标注中。我们对对齐标注的一致性进行了评价。结果表明,在该规范的指导下,标注者内部和标注者间的对齐都取得了比较理想的一致性,两组强、弱、伪三种对齐的Kappa值分别为0.99、0.98、0.93 和0.96、0.83、0.68。最后,一个简单的实验初步证实了该规范在统计机器翻译中的有效性。

Abstract

This paper presents a new guideline for Chinese-English word alignment. Starting from the existing Guidelines for Chinese-English Word Alignment (Linguistic Data Consortium , 2006), we propose a completely different classification for word alignment annotationgenuine link (involving strong link and weak link) and pseudo link. This explicit distinction can represent the characteristic of cross-lingual word alignment. The proposedguideline has been successfully applied in a large-scale task for Chinese-English Word alignment, achieving good intra- and inter-annotator agreemenst at the Kappa coefficients of 0.99、0.98、0.93 and 0.96、0.83、0.68 for the strong link, weak link and pseudo link respectively. And a further experiment proves that such annotated word alignment is useful for SMT system.
Key words artificial intelligence; machine translation; annotation guidelines for Chinese-English word alignment; manual word alignment; genuine link; pseudo link; strong link; weak link; alignment and annotation agreement

关键词

人工智能 / 机器翻译 / 汉英词语对齐规范 / 手工词语对齐 / 真对齐 / 伪对齐 / 强对齐 / 弱对齐 / 对齐和标注一致性

Key words

artificial intelligence / machine translation / annotation guidelines for Chinese-English word alignment / manual word alignment / genuine link / pseudo link / strong link / weak link / alignment and annotation agreement

引用本文

导出引用
赵红梅,刘群,张瑞强,吕雅娟,隅田英一郎,吴翠玲. 汉英词语对齐规范. 中文信息学报. 2009, 23(3): 65-88
ZHAO Hongmei , LIU Qun, ZHANG Ruiqiang, LV Yajuan, EiichiroSUMITA,Chooi-Ling GOH. A Guideline for Chinese-English Word Alignment. Journal of Chinese Information Processing. 2009, 23(3): 65-88

参考文献

[1]F.J. Och and Hermann Ney. A systematic comparison of various statistical alignment models [J]. Computational Linguistics, 2003, March, 29(1):1951.
 [2]Melamed, D. Annotation style guide for the Blinker project, Version 1.0.4. [R]. IRCS Technical Report #9806: University of Pennsylvania, Philadelphia , 1998.
 [3]Jean Véronis. ARCADE Tagging guidelines for word alignment, Version 1.0. [OL]. 1998. http:aune.lpl.univaix.fr/projects/arcade/2nd/word/guide/index.html.
 [4]Linguistic Data Consortium. Guidelines for ChineseEnglish Word Alignment, Version 1.1. [OL]. 2006. http:projects.ldc.upenn.edu/gale/Alignment/specs/GALE_Chinese_alignment_guidelines_v1.1.pdf.
 [5]Linguistic Data Consortium. Guidelines for ChineseEnglish Word Alignment, Version 3.0. [OL]. 2008.
 http:projects.ldc.upenn.edu/gale/Alignment/specs/GALE_Chinese_alignment_guidelines_v3.0.pdf
 [6]F.J. Och and H. Ney. Improved statistical alignment models [C]Proc. of the 38th Annual Meeting of the ACL. Hong Kong, China, 2000: pages 440447.
 [7]J.Cohen. A coefficient of agreement for nominal scales [OL]. 1960. http://www.garfield.library.upenn.edu/classics1986/A1986AXF2600001.pdf.
 [8]J.Carletta. Assessing agreement on classification tasks: the Kappa statistics [OL]. 1996. http:acl.ldc.upenn.edu/J/J96/J962004.pdf.
 [9]K.Krippendorff. Content Analysis: An introduction to its Methodology [M]. Beverly Hills: Sage Publications, 1980.
 [10]Philip Koehn et al. Moses: Open source toolkit for statistical machine translation [C]Proceedings of the ACL Demo and Poster Sessions. 2007: pages 177180.
 [11]Philipp Koehn, Franz Josef Och and Daniel Marcu. Statistical phrasebased translation [C]Proceedings of HLT/NAACL. 2003: pages 8188.
 [12]Franz Josef Och. Minimum Error Rate Training in Statistical Machine Translation [C]Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. 2003: pages 160167.
 [13]Papineni, K.S. Roukos, T. Ward, and W.J. Zhu. BLEU: a Method for Automatic Evaluation of Machine Translation [C]Proceedings of 40th Annual Meeting of the Association for Computational Linguistics (ACL). Philadelphia, PA: 2002: pages 311318.
PDF(1472 KB)

Accesses

Citation

Detail

段落导航
相关文章

/