深层差异特征增强的机器翻译自动评价

支思威,李茂西,吴水秀,陈有德

PDF(3135 KB)
PDF(3135 KB)
中文信息学报 ›› 2024, Vol. 38 ›› Issue (10) : 46-53.
机器翻译

深层差异特征增强的机器翻译自动评价

  • 支思威1,李茂西1,2,吴水秀1,陈有德1
作者信息 +

Deep Difference Feature Enhanced Automatic Evaluation of Machine Translation

  • ZHI Siwei1, LI Maoxi1,2, WU Shuixiu1, CHEN Youde1
Author information +
History +

摘要

机器翻译自动评价任务将机器翻译系统输出译文与人工参考译文进行对比定量计算翻译质量,在机器翻译的研究和应用中发挥着重要作用。当前主流的方法是使用预训练上下文语言模型表征机器翻译和人工参考译文,将两者的表征向量直接拼接输入前馈神经网络层以预测翻译质量;它没有在统一语义空间对两者之间的差异进行显式建模。该文提出基于深层差异特征增强的机器翻译自动评价方法,使用多头注意力机制深层抽象机器翻译和人工参考译文,利用两者在统一语义空间的差异特征增强当前最先进的自动评价方法UniTE_UP,将它们抽取的特征进行深层交互,以对机器翻译和人工参考译文之间的差异进行直接显式建模。在WMT21机器翻译自动评价基准数据集上的实验结果表明,深层差异特征增强的自动评价方法能有效提高机器翻译自动评价与人工评价的相关性,消融实验和深入的实验分析进一步揭示了深层差异特征的有效性。

Abstract

Automatic evaluation of machine translation assesses the quality of translations by comparing the output of machine translation systems with human reference translations. Current approach utilizes pre-trained contextual language models to represent machine translations and human reference translations, without an explicit modeling of the differences between the two types of translations in a unified semantic space. In this paper, we propose an automatic evaluation method for machine translation that is enhanced by deep difference features. Our method employs a multi-head attention mechanism to abstract the machine translation and human reference translation more comprehensively. Furthermore, it enhances the state-of-the-art automatic evaluation method, UniTE_UP, by incorporating the difference features between machine translation and human reference translation.Experimental results on the WMT′21 machine translation automatic evaluation benchmark dataset show that the deep difference feature-enhanced automatic evaluation method effectively improves the correlation between automatic and human evaluations. Ablation studies and in-depth experimental analysis further reveal the effectiveness of deep difference features.

关键词

机器翻译 / 自动评价 / 多头注意力 / 预训练上下文词向量 / 差异特征

Key words

machine translation / automatic evaluation / multi-head attention / pre-trained contextual word embedding / difference features

引用本文

导出引用
支思威,李茂西,吴水秀,陈有德. 深层差异特征增强的机器翻译自动评价. 中文信息学报. 2024, 38(10): 46-53
ZHI Siwei, LI Maoxi, WU Shuixiu, CHEN Youde. Deep Difference Feature Enhanced Automatic Evaluation of Machine Translation. Journal of Chinese Information Processing. 2024, 38(10): 46-53

参考文献

[1] 刘洋, 刘群, 林守勋. 机器翻译评测中的模糊匹配[J]. 中文信息学报, 2005,19(3): 46-54.
[2] 朱晓宁.基于语言学知识的机器翻译自动评价研究[D]. 哈尔滨: 哈尔滨工业大学硕士学位论文, 2011.
[3] 李良友. 融合文档信息的机器翻译自动评价研究[D].苏州: 苏州大学硕士学位论文, 2013.
[4] 李良友, 贡正仙, 周国栋. 机器翻译自动评价综述[J]. 中文信息学报, 2014,28(3): 81-91.
[5] 秦颖.翻译质量自动评价研究综述[J].计算机应用研究,2015,32(2): 326-329.
[6] 张丽林, 李茂西, 肖文艳. 机器翻译自动评价中领域知识复述抽取研究[J]. 北京大学学报 (自然科学版), 2017,53(2): 230-238.
[7] 马青松, 张金超, 刘群. 基于融合策略的机器翻译自动评价方法[J]. 中文信息学报, 2018,32(9): 11-19.
[8] 张芸祯. 基于篇章结构的机器翻译自动评价方法[D].济南: 山东师范大学硕士学位论文, 2019.
[9] 赵阳, 周龙, 王迁, 等. 民汉稀缺资源神经机器翻译技术研究[J]. 江西师范大学学报 (自然科学版), 2019, 43(6): 630-637.
[10] 张家俊,赵阳,宗成庆(译).神经机器翻译[M]. 北京: 机械工业出版社,2022-01.
[11] PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A method for automatic evaluation of machine translation[C]//Proceedings of the ACL,2002: 311-318.
[12] TAKAHASHI K, SUDOH K, NAKAMURA S. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics[C]//Proceedings of the HLT, 2002: 138-145.
[13] SATANJEEV B, ALON L. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments[C]//Proceedings of the ACL, 2005: 65-72.
[14] MAJA P, HERMANN N. Syntax-oriented evaluation measures for machine translation output[C]//Proce- edings of the WMT, 2009: 29-32.
[15] DENKOWSKI M, LAVIE A. Meteor universal: Language specific translation evaluation for any target language[C]//Proceedings of the WMT, 2014: 376-380.
[16] MATTHEW S, NITIN M, BONNIE D, et al. TERp system description[C]//Proceedings of Metrics MATR workshop at AMTA, 2008: 104-108.
[17] CHI-KIU L, DEKAI W. MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles[C]//Proceedings of the ACL, 2011: 220-229.
[18] NITIKA M, TIMOTHY B, TREVOR C. Putting evaluation in context: Contextual embeddings improve machine translation evaluation[C]//Proceedings of the ACL, 2019: 2799-2808.
[19] ZHAO W, PEYRARD M, LIU F, et al. MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance[C]//Proceedings of the EMNLP and IJCNLP,2019: 563-578.
[20] ZHANG T, KISHORE V, WU F, et al. Bertscore: Evaluating text generation with bert[C]//Proceedings of ICLR 2020,2020: 1-43.
[21] ZHAN R, LIU X, WONG D F, et al. Difficulty-aware machine translation evaluation[C]//Proceedings of the ACL, 2021: 26-32.
[22] REI R, STEWART C, FARINHA A C, et al.COMET: A neural framework for MT evaluation[C]//Proceedings of the EMNLP, 2020: 2685-2702.
[23] SELLAM T, DAS D, PARIKH A P.BLEURT: Learning robust metrics for text generation[C]//Proceedings of the ACL, 2020: 7881-1058.
[24] WAN Y, LIU D, YANG B, et al. RoBLEURT submission for the WMT metrics task[C]//Proceedings of the WMT,2021: 1053-1058.
[25] WAN Y, LIU D, YANG B, et al.UniTE: Unified translation evaluation[C]//Proceedings of the ACL, 2022: 8117-8127.
[26] HU W, LI M, QIU B, et al. Neural automatic evaluation of machine translation method combined with XLM word representation[C]//Proceedings of the CCL,2021: 13-22.
[27] LIU Y, OTT M, GOYAL N, et al. RoBERTA: A robustly optimized BERT pretraining approach[J]. arXiv preprint arXiv. arXiv: 1907.11692,2019b.
[28] FREITAG M, REI R, MATHUR N, et al. Results of the WMT21 metrics shared task: Evaluating metrics with expert based human evaluations on TED and news domain[C]//Proceedings of the WMT,2021: 733-774.
[29] BOJAR O, BUCK C,FESERMANN C, et al.Findings of the 2015 workshop on statistical machine translation[C]//Proceedings of the WMT, 2015: 256-273.
[30] BOJAR O, CHATTERJEE R, FEDERMANN C, et al.Findings of the 2016 Conference on machine translation[C]//Proceedings of the WMT,2016: 131- 198.
[31] BOJAR O, CHATTERJEE R, CHRISTIAN F, et al. Findings of the 2017 conference on machine translation[C]//Proceedings of the WMT, 2017: 169- 214.
[32] ONDEJ B, CHRISTIAN F, MARK F, et al. Findings of the 2018 conference on machine translation[C]//Proceedings of the WMT, 2018: 272- 303.
[33] LOIC B, ONDFEJ B, MR C, et al. Findings of the 2019 conference on machine translation[C]//Proceedings of the WMT, 2019: 1-61.
[34] LO C. YiSi: A unified semantic mt quality evalua- tion and estimation metric for languages with different levels of available resources[C]//Proceedings of the WMT, 2019: 507-513.
[35] POPOVIC M. chrF: Character n-gram f-score for automatic MT evaluation[C]//Proceedings of the WMT, 2015: 392-395.
[36] 翟煜锦, 李培芸, 项青宇, 等. 基于 QE 的机器翻译重排序方法研究[J]. 江西师范大学学报(自然科学版), 2020, 44(1): 46-50.
[37] MCINNES L, HEALY J, SAUL N, et al.UMAP: Uniform manifold approximation and projection[J]. Joural of Open Source Software, 2018,3(29): 861-869.

基金

国家自然科学基金(62366020);江西省教育厅科技项目(GJJ210306)
PDF(3135 KB)

Accesses

Citation

Detail

段落导航
相关文章

/