汉英词语对齐规范

PDF(1472 KB)

中文信息学报 ›› 2009, Vol. 23 ›› Issue (3) : 65-88.

综述

汉英词语对齐规范

赵红梅¹,刘群¹,张瑞强²,吕雅娟¹,隅田英一郎²,吴翠玲²

作者信息 +

A Guideline for Chinese-English Word Alignment

ZHAO Hongmei ¹, LIU Qun¹, ZHANG Ruiqiang², LV Yajuan¹, EiichiroSUMITA²,Chooi-Ling GOH²

Author information +

History +

摘要

该文介绍了一个新的汉英词语对齐规范。该规范以现有的LDC汉英词语对齐规范为基础,对其进行了较大的改进和扩展,特别是提出了一种全新的对齐标注方法 —— 将词语对齐区分为真对齐和伪对齐,真对齐又分为强对齐和弱对齐。这种细化的标注方法能够更好地刻画词语对齐的特点。该规范已经实际应用于大规模的人工词语对齐标注中。我们对对齐标注的一致性进行了评价。结果表明,在该规范的指导下,标注者内部和标注者间的对齐都取得了比较理想的一致性,两组强、弱、伪三种对齐的Kappa值分别为0.99、0.98、0.93 和0.96、0.83、0.68。最后,一个简单的实验初步证实了该规范在统计机器翻译中的有效性。

Abstract

This paper presents a new guideline for Chinese-English word alignment. Starting from the existing Guidelines for Chinese-English Word Alignment (Linguistic Data Consortium , 2006), we propose a completely different classification for word alignment annotationgenuine link (involving strong link and weak link) and pseudo link. This explicit distinction can represent the characteristic of cross-lingual word alignment. The proposedguideline has been successfully applied in a large-scale task for Chinese-English Word alignment, achieving good intra- and inter-annotator agreemenst at the Kappa coefficients of 0.99、0.98、0.93 and 0.96、0.83、0.68 for the strong link, weak link and pseudo link respectively. And a further experiment proves that such annotated word alignment is useful for SMT system.
Key words artificial intelligence; machine translation; annotation guidelines for Chinese-English word alignment; manual word alignment; genuine link; pseudo link; strong link; weak link; alignment and annotation agreement

导出引用

赵红梅,刘群,张瑞强,吕雅娟,隅田英一郎,吴翠玲. 汉英词语对齐规范. 中文信息学报. 2009, 23(3): 65-88

ZHAO Hongmei , LIU Qun, ZHANG Ruiqiang, LV Yajuan, EiichiroSUMITA,Chooi-Ling GOH. A Guideline for Chinese-English Word Alignment. Journal of Chinese Information Processing. 2009, 23(3): 65-88

参考文献

［1］F.J. Och and Hermann Ney. A systematic comparison of various statistical alignment models ［J］. Computational Linguistics, 2003, March, 29(1):1951.
 ［2］Melamed, D. Annotation style guide for the Blinker project, Version 1.0.4. ［R］. IRCS Technical Report #9806: University of Pennsylvania, Philadelphia , 1998.
 ［3］Jean Véronis. ARCADE Tagging guidelines for word alignment, Version 1.0. ［OL］. 1998. http:aune.lpl.univaix.fr/projects/arcade/2nd/word/guide/index.html.
 ［4］Linguistic Data Consortium. Guidelines for ChineseEnglish Word Alignment, Version 1.1. ［OL］. 2006. http:projects.ldc.upenn.edu/gale/Alignment/specs/GALE_Chinese_alignment_guidelines_v1.1.pdf.
 ［5］Linguistic Data Consortium. Guidelines for ChineseEnglish Word Alignment, Version 3.0. ［OL］. 2008.
 http:projects.ldc.upenn.edu/gale/Alignment/specs/GALE_Chinese_alignment_guidelines_v3.0.pdf
 ［6］F.J. Och and H. Ney. Improved statistical alignment models ［C］Proc. of the 38th Annual Meeting of the ACL. Hong Kong, China, 2000: pages 440447.
 ［7］J.Cohen. A coefficient of agreement for nominal scales ［OL］. 1960. http://www.garfield.library.upenn.edu/classics1986/A1986AXF2600001.pdf.
 ［8］J.Carletta. Assessing agreement on classification tasks: the Kappa statistics ［OL］. 1996. http:acl.ldc.upenn.edu/J/J96/J962004.pdf.
 ［9］K.Krippendorff. Content Analysis: An introduction to its Methodology ［M］. Beverly Hills: Sage Publications, 1980.
 ［10］Philip Koehn et al. Moses: Open source toolkit for statistical machine translation ［C］Proceedings of the ACL Demo and Poster Sessions. 2007: pages 177180.
 ［11］Philipp Koehn, Franz Josef Och and Daniel Marcu. Statistical phrasebased translation ［C］Proceedings of HLT/NAACL. 2003: pages 8188.
 ［12］Franz Josef Och. Minimum Error Rate Training in Statistical Machine Translation ［C］Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. 2003: pages 160167.
 ［13］Papineni, K.S. Roukos, T. Ward, and W.J. Zhu. BLEU: a Method for Automatic Evaluation of Machine Translation ［C］Proceedings of 40th Annual Meeting of the Association for Computational Linguistics (ACL). Philadelphia, PA: 2002: pages 311318.

PDF(1472 KB)

Accesses

Citation

Detail

段落导航

摘要
Abstract
关键词
Key words
引用本文
参考文献

Received	Published
2008-11-24	2009-06-15
Issue Date
2009-06-15

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注