罗琪,李茂西. 引入源端信息的机器译文自动评价方法研究[J]. 中文信息学报, 2021, 35(12): 60-67.
LUO Qi, LI Maoxi. Research on Incorporating the Source Information to Automatic Evaluation of Machine Translation. , 2021, 35(12): 60-67.
引入源端信息的机器译文自动评价方法研究
罗琪,李茂西
江西师范大学 计算机信息工程学院,江西 南昌 330022
Research on Incorporating the Source Information to Automatic Evaluation of Machine Translation
LUO Qi, LI Maoxi
School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, Jiangxi 330022, China
Abstract:Automatic evaluation of machine translation is a key issue in machine translation. In existing work, the source sentence information is completely ignored and only the reference is used to measure the translation quality. This paper presents a novel automatic evaluation metric incorporating the source information: extracting the quality embeddings that describes the translation quality from a tuple consist of the machine translations and their corresponding source sentences, and incorporating it into the automatic evaluation method based on contextual embeddings by using a deep neural network. The experimental results on the dataset of WMT-19 Metrics task show that the proposed method can effectively enhance the evaluation correlation with the human judgments. Deep analysis further reveals that the information of the source sentences plays an important role in automatic evaluation of machine translation.
[1]Papineni Kishore, Roukos Salim, Ward Todd, et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the ACL, 2002: 311-318.
[2]Doddington George. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics[C]//Proceedings of the HLT, 2002: 138-145.
[3]Popovi〖KG-*4〗c〖DD(-1*4〗〖HT6〗'〖DD)〗 Maja, Ney Hermann. Syntax-oriented evaluation measures for machine translation output[C]//Proceedings of the WMT, 2009: 29-32.
[4]Banerjee Satanjeev, Lavie Alon. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments[C]//Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, 2005: 65-72.
[5]Snover Matthew, Madnani Nitin, Dorr Bonnie, et al. TERp system description[C]//Proceedings of the Metrics MATR Workshop at AMTA, 2008: 104-108.
[6]Lo Chi-kiu . MEANT 2.0: accurate semantic MT evaluation for any output language[C]//Proceedings of the WMT, 2017: 589-597.
[7]Mikolov Tomas, Sutskever Ilya, Chen Kai, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the NIPS, 2013: 3111-3119.
[8]Boxing Chen, Hongyu Guo. Representation based translation evaluation metrics[C]//Proceedings of the ACL and IJCNLP. 2015: 150-155.
[9]Devlin Jacob, Chang Mingwei, Lee Kenton, et al. Bert: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the NAACL, 2018: 4171-4186.
[10]Mathur Nitika, Baldwin Timothy, Cohn Trevor. Putting evaluation in context: contextual embeddings improve machine translation evaluation[C]//Proceedings of the ACL, 2019: 2799-2808.
[11]Gupta Rohit, Orasan Constantin, van Genabith Josef. Reval: a simple and effective machine translation evaluation metric based on recurrent neural networks[C]//Proceedings of the EMNLP, 2015: 1066-1072.
[12]Shimanaka Hiroki, Kajiwara Tomoyuki, Komachi Mamoru. Ruse: regressor using sentence embeddings for automatic machine translation evaluation[C]//Proceedings of the WMT, 2018: 751-758.
[13]Guzmán Francisco, Joty Shafiq, Màrquez Lluís, et al. Pairwise neural machine translation evaluation[C]//Proceedings of the ACL and IJCNLP, 2019: 805-814.
[14]Tai Kai Sheng, Socher Richard, Manning Christopher D. Improved semantic representations from tree-structured long short-term memory networks[C]//Proceedings of the ACL and IJCNLP, 2015: 1556-1566.
[15]Lili Men, Rui Men, Ge Li, et al. Natural language inference by tree-based convolution and heuristic matching[C]//Proceedings of the ACL, 2015: 130-136.
[16]Qian Chen, Xiaodan Zhu, Zhenhua Ling, et al. Enhanced ISTM for natural language inference[C]//Proceedings of the ACL, 2016: 1657-1668.
[17]Bahdanau Dzmitry, Cho Kyunghyun, Bengio Yoshua. Neural machine translation by jointly learning to align and translate[C]//Proceedings of the ICLR, 2014: 1-15.
[18]Kim Hyun, Jung Hunyoung, Kwon Hongseok, et al. Predictor-estimator: neural quality estimation based on target word prediction for machine translation[J]. ACM Transactions on Asian and Low-Resource Language Information Processing, 2017, 17(1): 1-22.
[19]Maoxi Li, Qingyu Xiang, Zhiming Chen, et al. A unified neural network for quality estimation of machine translation[J]. IEICE Transactons on Information and Systems, 2018, 101(9): 2417-2421.
[20]Vaswani Ashish, Shazeer Noam, Parmar Niki, et al. Attention is all you need[C]//Proceedings of the NIPS, 2017: 5998-6008.
[21]Kai Fan, Jiayi Wang, Bo Li, et al. “Bilingual expert” can find translation errors[C]//Proceedings of the AAAI, 2019,33: 6367-6374.
[22]Ziyang Wang, Hui Liu, Hexuan Chen, et al. NiuTrans submission for CCMT19 quality estimation task[C]//Proceedings of the CCMT, 2019: 82-92.
[23]Specia Lucia, Blain Frédéric, Logacheva Varvara, et al. Findings of the WMT 2018 shared task on quality estimation[C]//Proceedings of the WMT, 2018: 689-709.
[24]Qingsong Ma, Wei Johnny, Bojar 〖KG-*4〗[SX(B-*2/3]ˇ[]r[SX)]ej, et al. Results of the WMT19 metrics shared task: segment-level and strong MT systems pose big challenges[C]//Proceedings of the WMT, 2019: 62-90.
[25]Popovic Maja. chrF: character n-gram F-score for automatic MT evaluation[C]//Proceedings of the WMT, 2015: 392-395.
[26]Stanojevic Milo, Sima’an Khalil. Beer: Better evaluation as ranking[C]//Proceedings of the WMT, 2014: 414-419.
[27]Bojar Ond〖KG-*4〗[SX(B-*2/3]ˇ[]r[SX)]ej, Chatterjee Rajen, Federmann Christian, et al. Findings of the 2015 workshop on statistical machine translation[C]//Proceedings of the WMT, 2015: 1-46.
[28]Bojar Ond〖KG-*4〗[SX(B-*2/3]ˇ[]r[SX)]ej, Chatterjee Rajen, Federmann Christian, et al. Findings of the 2016 conference on machine translation[C]//Proceedings of the WMT, 2016: 131-198.
[29]Bojar Ond〖KG-*4〗[SX(B-*2/3]ˇ[]r[SX)]ej, Chatterjee Rajen, Federmann Christian, et al. Findings of the 2017 conference on machine translation[C]//Proceedings of the WMT, 2017: 169-214.