融合XLM词语表示的神经机器译文自动评价方法

胡纬,李茂西,裘白莲,王明文

PDF(2746 KB)
PDF(2746 KB)
中文信息学报 ›› 2023, Vol. 37 ›› Issue (9) : 46-54.
机器翻译

融合XLM词语表示的神经机器译文自动评价方法

  • 胡纬1,2,李茂西1,3,裘白莲1,王明文1,3
作者信息 +

Neural Automatic Evaluation of Machine Translation Method Combined with XLM Word Representation

  • HU Wei1,2, LI Maoxi1,3, QIU Bailian1, WANG Mingwen1,3
Author information +
History +

摘要

机器译文自动评价对机器翻译的发展和应用起着重要的促进作用,其一般通过计算机器译文和人工参考译文的相似度来度量机器译文的质量。该文通过跨语种预训练语言模型XLM将源语言句子、机器译文和人工参考译文映射到相同的语义空间,结合分层注意力和内部注意力提取源语言句子与机器译文、机器译文与人工参考译文以及源语言句子与人工参考译文之间的差异特征,并将其融入基于Bi-LSTM神经译文自动评价方法中。在WMT'19译文自动评价数据集上的实验结果表明,融合XLM词语表示的神经机器译文自动评价方法显著提高了其与人工评价的相关性。

Abstract

The automatic evaluation of machine translation plays an important role in promoting the development and application of machine translation. It generally measures the quality of machine translation through calculating the similarity between machine translation and its reference. This paper uses the cross-lingual language model XLM to map source sentences, machine translations and references to the same semantic space, and combines layer-wise attention and intra attention to extract the difference features from source sentences and machine translations, machine translations and its references, source sentences and its references, then integrates them into the automatic evaluation method based on Bi-LSTM neural network. The experimental results on the dataset of WMT'19 Metrics task show that the neural automatic evaluation method of machine translation combined with XLM word representation significantly improves its correlation with human judgments.

关键词

机器翻译 / 译文自动评价 / 跨语种预训练语言模型 / 差异特征

Key words

machine translation / automatic evaluation of machine translation / cross-lingual pre-trained language model / difference features

引用本文

导出引用
胡纬,李茂西,裘白莲,王明文. 融合XLM词语表示的神经机器译文自动评价方法. 中文信息学报. 2023, 37(9): 46-54
HU Wei, LI Maoxi, QIU Bailian, WANG Mingwen. Neural Automatic Evaluation of Machine Translation Method Combined with XLM Word Representation. Journal of Chinese Information Processing. 2023, 37(9): 46-54

参考文献

[1] YU S. Automatic evaluation of output quality for machine translation systems[J]. Machine Translation, 1993, 8(1-2): 117-126.
[2] 李良友, 贡正仙, 周国栋. 机器翻译自动评价综述[J]. 中文信息学报, 2014, 28(3): 81-90.
[3] PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A method for automatic evaluation of machine translation[C]//Proceedings of the ACL, 2002: 311-318.
[4] 刘洋, 刘群, 林守勋. 机器翻译评测中的模糊匹配[J]. 中文信息学报, 2005,19(3):46-54.
[5] GEORGE D. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics[C]//Proceedings of the HLT, 2002: 138-145.
[6] BANERJEE S, LAVIE A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments[C]//Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, 2005: 65-72.
[7] 张丽林, 李茂西, 肖文艳,等. 机器翻译自动评价中领域知识复述抽取研究[J]. 北京大学学报(自然科学版), 2017, 53(2):230-238.
[8] SNOVER M, MADNANI N, DORR B, et al. TERp system description[C]//Proceedings of the MetricsMATR workshop at AMTA, 2008: 104-108.
[9] 姚建民, 周明, 赵铁军,等. 基于句子相似度的机器翻译评价方法及其有效性分析[J]. 计算机研究与发展, 2004,41(07): 1258-1265.
[10] MAJA P C.chrF: Character n-gram F-score for automatic MT evaluation[C]//Proceedings of the wmt, 2015,392-395.
[11] LO Chi-kiu. MEANT 2.0: Accurate semantic MT evaluation for any output language[C]//Proceedings of the WMT, 2017: 589-597.
[12] JUNGUO Z H, MUYUN Y, BO W, et al. All in strings: A powerful string-based automatic MT evaluation metric with multiple granularities[C]//Proceedings of the Coling, 2010:1533-1540.
[13] BOXING C H, HONGYU G. Representation based translation evaluation metrics[C]//Proceedings of the ACL & IJCNLP, 2015: 150-155.
[14] MATHUR N, BALDWIN T, COHN T. Putting evaluation in context: contextual embeddings improve machine translation evaluation[C]//Proceedings of the ACL, 2019: 2799-2808.
[15] CONNEAU A, LAMPLE G. Cross-lingual language model pretraining[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019: 7059-7069.
[16] REI R, STEWART C, FARINHA A C, et al. COMET: A neural framework for MT evaluation[C]//Proceedings of the EMNLP, 2020: 2685-2702.
[17] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the NIPS, 2013: 3111-3119.
[18] GUPTA R, ORASAN C, VAN GENABITH J. Reval: A simple and effective machine translation evaluation metric based on recurrent neural networks[C]//Proceedings of the EMNLP, 2015: 1066-1072.
[19] DEVLIN J, CHANG M-W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the NAACL, 2018: 4171-4186.
[20] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training.[EB/OL] https://blog.openai.com/language-unsupervised.
[21] SHIMANAKA H, KAJIWARA T, KOMACHI M. Ruse: Regressor using sentence embeddings for automatic machine translation evaluation[C]//Proceedings of the WMT, 2018: 751-758.
[22] CONNEAU A, KIELA D, SCHWENK H, et al. Supervised learning of universal sentence representations from natural language inference data[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2017: 670-680.
[23] LOGESWARAN L, LEE H. An efficient framework for learning sentence representations[C]//Proceedings of ICLR, 2018: 1-16.
[24] CER D, YANG Y, KONG S, et al. Universal sentence encoder[C]//Proceedings of the EMNLP, 2018: 169-174.
[25] SHIMANAKA H, KAJIWARA T, KOMACHI M. Machine translation evaluation with bert regressor[J]. arXiv preprint arXiv:1907.12679, 2019.
[26] QIN Y,SPECIA L. Truly exploring multiple references for machine translation evaluation[C]//Proceedings of the EACL, 2015: 113-120.
[27] FOMICHEVA M, SPECIA L, GUZMN F. Multi-hypothesis machine translation evaluation[C]//Proceedings of the ACL, 2020: 1218-1232.
[28] TAKAHASHI K,SUDOH K, NAKAMURA S. Automatic machine translation evaluation using source language inputs and cross-lingual language model[C]//Proceedings of the ACL, 2020: 3553-3558.
[29] LUO Q, LI M. Research on incorporating the source information to automatic evaluation of machine translation[C]//Proceedings of the CCL, 2020: 414-423.
[30] SENNRICH R, HADDOW B, BIRCH A. Neural machine translation of rare words with subword units[C]//Proceedings of the ACL, 2016: 1715-1725.
[31] CONNEAU A, RINOTT R, LAMPLE G, et al. XNLI: Evaluating cross-lingual sentence representations[C]//Proceedings of the EMNLP, 2020: 2475-2485.
[32] MAOXI L, QINGYU X, ZHIMING C H, et al. A unified neural network for quality estimation of machine translation[J]. Ieice Transactions on Information and Systems, 2018, 101(9): 2417-2421
[33] TENNEY I, DAS D, PAVLICK E. BERT rediscovers the classical NLP pipeline[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 4593-4601.
[34] ZHANG T, KISHORE V, WU F, et al.Bertscore: Evaluating text generation with bert[C]//Proceedings of ICLR, 2020: 1-43.
[35] QINGSONG M, WEI J , BOJAR O, et al. Results of the WMT metrics shared task: Segment-level and strong MT systems pose big challenges[C]//Proceedings of the WMT, 2019: 62-90.
[36] BOJAR O, CHATTERJEE R, FEDERMANN C, et al. Findings of the workshop on statistical machine translation[C]//Proceedings of the WMT, 2015: 1-46.
[37] BOJAR O, CHATTERJEE R, FEDERMANN C, et al. Findings of the conference on machine translation[C]//Proceedings of the WMT, 2016: 131-198.
[38] BOJAR O, CHATTERJEE R, FEDERMANN C, et al. Findings of the conference on machine translation[C]//Proceedings of the WMT, 2017: 169-214.
[39] STANOJEVIC′ M, SIMA’AN K. Beer: Better evaluation as ranking[C]//Proceedings of the WMT, 2014: 414-419.

基金

国家自然科学基金(61662031)
PDF(2746 KB)

Accesses

Citation

Detail

段落导航
相关文章

/