在中英语料下复现Facebook提出的无监督神经机器翻译方法时,我们发现模型出现了退化现象。该文分析了退化的可能原因并提出三种简单方法来抑制模型退化。方法一,遮蔽非目标语输出;方法二,双语词典逐词翻译退化译文;方法三,在训练过程中,添加10万句对的平行语料。结果显示,三种方法都能有效抑制模型退化。在无监督条件下,方法二的性能更好,BLEU值为7.87;在10万语料的低资源条件下,方法一效果更好,BLEU值为14.28,该文还分析了产生此现象的原因。
Abstract
Model degeneration appears in reproducing the unsupervised neural machine translation method proposed by Facebook AI research in the context of Chinese and English language pairs. We propose three simple yet effective methods to solve this problem. First, we mask the non-target words in the translation. Second, we use a dictionary to translate the degenerated machine translations into the target language. Third, we add some parallel corpus (100k parallel sentences) into the training process. Experimental results show that all three methods can effectively prevent the model from degeneration. In total unsupervised setting, the 2nd method has a better BLEU score of 7.87. With 100k parallel sentences, the 1st method is better with a BLEU score of 14.28.
关键词
无监督神经机器翻译 /
低资源 /
模型退化
{{custom_keyword}} /
Key words
unsupervised neural machine translation /
low resource /
model degeneration
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] LAMPLE G, CONNEAU A, DENOYER L, et al. Unsupervised machine translation using monolingual corpora only [C]//Proceedings of the International Conference on Learning Representations, 2018: 1-12.
[2] LAMPLE G, OTT M, CONNEAU A, et al. Phrase-based and neural unsupervised machine translation [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 5039–5049.
[3] LAMPLE G, CONNEAU A. Cross-lingual language model pretraining [C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019.
[4] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
[5] DEVLIN J, CHANG M, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding [C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4171-4186.
[6] SENNRICH R, HADDOW B, BIRCH A. Improving neural machine translation models with monolingual data [C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 86-96.
[7] SENNRICH R, HADDOW B, BIRCH A. Neural machine translation of rare words with subword units [C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, 2015: 1715-1725.
[8] WU S, CONNEAU A, LI H, et al. Emerging cross-lingual structure in pretrained language models [C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 6022-6034.
[9] KARTHIKEYAN K, WANG Z, MAYHEW S, et al. Cross-lingual ability of multilingual BERT: An empirical study [C]//Proceedings of the International Conference on Learning Representations, 2020.
[10] GULCEHRE C, FIRAT O, XU K, et al. On integrating a language model into neural machine tran- slation [J]. Computer Speech and Language, 2017, 45(1): 137-148.
[11] ARTETXE M, LABAKA G, AGIRRE E. Unsupervised neural machine translation [C]//Proceedings of the 56th International Conference on Learning Representations, 2018: 46-55.
[12] ARTETXE M , LABAKA G , AGIRRE E. An effective approach to unsupervised machine translation [C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 194-203.
[13] REN S, ZHANG Z, LIU S, et al. Unsupervised neural machine translation with SMT as posterior regularization [C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 2019: 241-248.
[14] WANG X, LU Z, TU Z, et al. Neural machine translation advised by statistical machine translation [C]//Proceedings of the 17th IEEE International Conference on Machine Learning and Applications, 2017: 1209-1213.
[15] HE W, HE Z, WU H, et al. Improved neural machine translation with SMT features [C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence, 2016: 151-157.
[16] WANG X, TU Z, XIONG D, et al. Translating phrases in neural machine translation [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017: 1421-1431.
[17] WANG X, TU Z, ZHANG M. Incorporating statistical machine translation word knowledge into neuralmachine translation [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(12): 2255-2266.
[18] YANG Z, CHEN W, WANG F, et al. Unsupervised neural machine translation with weight sharing [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 46-55.
[19] ZHENG Z X, ZHOU H, HUANG S J, et al. Mirror-generative neural machine translation [C]//Proceedings of 8th International Conference on Learning Representations, 2020.
[20] ARTETXE M, LABAKA G, AGIRRE E. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 789-798.
[21] LI Y, LUO Y, LIN Y, et al. A simple and effective approach to robust unsupervised bilingual dictionary induction [C]//Proceedings of the 28th International Conference on Computational Linguistics, COLING, 2020: 5990-6001.
[22] 张檬,刘洋,孙茂松.基于非平行语料的双语词典构建 [J].中国科学:信息科学, 2018, 48(5): 564-573.
[23] DOU Z, ZHOU Z, HUANG S. Unsupervised bilingual lexicon induction via latent variable models [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 621-626.
[24] DUAN X, JI B, JIA H, et al. Bilingual dictionary based neural machine translation without using parallel sentences [C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 1570-1579.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61732005,61761026,61672271);国家重点研发计划(2019QY1801);云南省重大科技专项计划项目(202002AD080001)
{{custom_fund}}