基于融合策略的机器翻译自动评价方法

马青松,张金超,刘群

PDF(1264 KB)
PDF(1264 KB)
中文信息学报 ›› 2018, Vol. 32 ›› Issue (9) : 11-19.
机器翻译

基于融合策略的机器翻译自动评价方法

  • 马青松1,2,3,张金超1,2,3,刘群1,4
作者信息 +

A Novel MT Metric Based on the Hybrid Strategy

  • MA Qingsong1,2,3, ZHANG Jinchao1,2,3, LIU Qun1,4
Author information +
History +

摘要

机器翻译自动评价发展至今,各种自动评价方法不断涌现。不同的自动评价方法从不同的角度评价机器译文的质量。该文提出了基于融合策略的自动评价方法,该方法可以融合多个自动评价方法,多角度地综合评价机器译文质量。该文主要在以下几个方面探索进行: (1)对比分别使用相对排序(RR)和直接评估(DA)两种人工评价方法指导训练融合自动评价方法,实验表明使用可靠性高的DA形成的融合自动评价方法(Blend)性能更好; (2)对比Blend分别使用支持向量机(SVM)和全连接神经网络(FFNN)机器学习算法,实验表明在当前数据集上,使用SVM效果更好; (3)进而在SVM基础上,探索使用不同的评价方法对Blend的影响,为Blend寻找在性能和效率上的平衡; (4)把Blend推广应用到其他语言对上,说明它的稳定性及通用性。在WMT16评测数据上的实验,以及参加WMT17评测的结果均表明,Blend与人工评价的一致性达到领先水平。

Abstract

With the development of machine translation (MT) evaluation,various MT metrics have been proposed. Different metrics evaluate the quality of MT hypotheses from different perspectives. This paper proposes a novel MT metric that combines the merits of a range of metrics. Our investigation includes several aspects: (1) Comparing the performance of combined metrics that using Direct Assessment manual evaluation (DA) or Relative Ranking human evaluation (RR) to guide the training process. Experiments show that reliable DA human evaluation benefits the combined metric,Blend. (2) Comparing the performance of Blend using SVM or FFNN as the training algorithm. (3) Examining the contribution of metrics incorporated in Blend tentatively,in order to find a trade-off between performance and efficiency. (4) Applying Blend to other language pairs if incorporated metrics support the specific language pair. Experiments on WMT16 and WMT17 Metrics tasks show that Blend achieves the start-of-the-art performance.

关键词

机器翻译自动评价 / 融合 / 直接评估

Key words

machine translation metric / combined / direct assessment

引用本文

导出引用
马青松,张金超,刘群. 基于融合策略的机器翻译自动评价方法. 中文信息学报. 2018, 32(9): 11-19
MA Qingsong, ZHANG Jinchao, LIU Qun. A Novel MT Metric Based on the Hybrid Strategy. Journal of Chinese Information Processing. 2018, 32(9): 11-19

参考文献

[1] Papineni K,Roukos S,Ward T,et al. BLEU:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Philadelphia,2002:311-318.
[2] Doddington G. Automatic evaluation of machine translation quality using n-gram co-occurence statistics[C]//Proceedings of the 2nd International Conference on Human Language Technology Research. San Diego,California,2002:138-145.
[3] Denkowski M,Lavie A. Meteor universal:Language specific translation evaluation for any target language[C]//Proceedings of the 9th Workshop on Statistical Machine Translation. Baltimore,Maryland USA,2014:376-380.
[4] Melamed I D,Green R,Turian J P. Precision and recall of machine translation [C]//Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology:Companion Volume of the Proceedings of HLT-NAACL 2003—short papers-Volume 2. Edmonton,Canada,2003:61-63.
[5] Nieen S,Och F J,Leusch G,et al. An evaluation tool for machine translation:Fast evaluation for MT research[C]//Proceedings of the 2nd International Conference on Language Resources and Evaluation. Athens,Greece,2000.
[6] Tillmann C,Vogel S,Ney H,et al. Accelerated DP based search for statistical translation[C]//Proceedings of the 5th European Conference on Speech Communication and Technology. Rhodes,Greece,1997.
[7] Snover M,Madnani N,Dorr B J,et al. Fluency,adequacy,or HTER?:Exploring different human judgements with a tunable MT metric[C]//Proceedings of the 4th Workshop on Statistical Machine Translation. Athens,Greece,2009:259-268.
[8] Chan Y S,Ng H T. MAXSIM:a maximum similarity metric for machine translation evaluation[C]//Proceedings of ACL-08:HLT. Columbus,Ohio,USA,2008:55-62.
[9] Owczarzak K,van Genabith J,Way A. Evaluating machine translation with LFG dependencies[J].Machine Translation,2007,21(2):95-119.
[10] Liu D,Gildea D. Syntactic features for evaluation of machine translation[C]//Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization,2005:25-32.
[11] Giménez J,Màrquez L. Heterogeneous automatic MT evaluation through non-parametric metric combinations[C]//Proceedings of the 3rd International Joint Conference on Natural Language Processing. Hyderabad,India,2008.
[12] Vapnik V N. Statistical learning theory(Vol.1)[M].New York:Wiley,1998.
[13] Graham Y,Baldwin T,Moffat A,et al. Continuous measurement scales in human evaluation of machine translation[C]//Proceedings of the 7th Linguistic Annotation Workshop & Interoperability with Discourse. Sofia,Bulgaria,2013:33-41.
[14] Callison-Burch C,Fordyce C,Koehn P,et al.(Meta-) evaluation of machine translation[C]//Proceedings of the 2nd Workshop on Statistical Machine Translation. Prague,Czech Republic,2007:136-158.
[15] Giménez J,Màrquez L. Asiya:An open toolkit for automatic machine translation(Meta-) Evaluation[J].Prague Bull. Math. Linguistics,1994(1):77-86.
[16] Yu H,Weizhi X,Liu Q,et al. ENTF:An Entropy-Based MT evaluation Metric[C]//Proceedings of 13th China Workshop,CWMT. Dalian,China,2017:68-77.
[17] Yu H,Wu X,Xie J,et al. RED:A reference dependency rased MT evaluation metric[C]//Proceedings of the 25th International Conference on Computational Linguistics. Dublin,Ireland,2014:2042-2051.
[18] Yu H,Wu X,Jiang W,et al. An automatic machine translation evaluation metric based on dependency parsing model[J].arXiv preprint. 2015. arXiv:1508.01996.
[19] UFAL,M. Results of the WMT15 metrics shared task[C]//Proceedings of the 10th Workshop on Statistical Machine Translation. Lisboa,Portugal,2015:256-273.
[20] Bojar O,Graham Y,Kamran A,et al. Results of the WMT16 metrics shared task[C]//Proceedings of the 1st Conference on Machine Translation. Berlin,Germany,2016:199-231.
[21] Chang C C,Lin C J. LIBSVM:a library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology,2011,2(3),27.
[22] Wang W,Peter J T,Rosendahl H,et al. Character:Translation edit rate on character level[C]//Proceedings of the 1st Conference on Machine Translation. Berlin,Germany,2016:505-510.
[23] Stanojevic M,Sima-an K. BEER 1.1:ILLC UvA submission to metrics and tuning task[C]//Proceedings of the 10th Workshop on Statistical Machine Translation. Lisboa,Portugal,2015:396-401.
[24] Bojar O,Graham Y,Kamran A. Results of the WMT17 metrics shared task[C]//Proceedings of the Conference on Machine Translation(WMT). Copenhagen,Denmark,2017:489-513.
[25] Ma Q,Graham Y,Wang S,et al. Blend:a novel combined MT metric based on direct assessment—CASICT-DCU submission to WMT17 metrics task[C]//Proceedings of the Conference on Machine Translation(WMT). Copenhagen,Denmark,2017:598-603.

基金

国家自然科学基金(61379086)
PDF(1264 KB)

Accesses

Citation

Detail

段落导航
相关文章

/