Low Resource Neural Machine Translation with Enhanced Representation of Rare Words
ZHU Junguo1,2, YANG Fuan1,2, YU Zhengtao1,2, ZOU Xiang1,2, ZHANG Zefeng1,2
1.Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan 650500, China; 2.Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, Yunan 650500, China
Abstract:In neural machine translation, the low-frequency word is a key factor affecting the quality of the translation output, which is more prominent in low-resource scenario. This paper proposes a low-resource neural machine translation method with enhanced the representation of low-frequency words. The main idea is to use monolingual data context information to learn the probability distribution of low-frequency words, and recalculate the word embeddings of low-frequency words based on this distribution. The Transformer model is then re-trained by the new word embeddings, thereby effectively alleviating the problem of representing low-frequency words inaccurately. The experimental results in the four directions between Chinese and Vietnamese, Chinese and Mongolian translation tasks show that the method proposed in this paper has a significant improvement over the baseline model.
[1] Chren W A. One-hot residue coding for low delay-power product CMOS design[J]. IEEE Transactions on Circuits and Systems II Analog and Digital Signal Processing, 1998, 45(3):303-313. [2] Gao F, Zhu J H, Wu L J, et al. Soft contextual data augmentation for neural machine translation[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 5539-5544. [3] Papineni K, Roukos S, Ward T, et al. BLEU: A method for automatic evaluation of machine translation[C]//Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, 2002: 311-318. [4] Mi H, Wang Z, Ittycheriah A. Vocabulary manipulation for neural machine translation[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 124-129. [5] Wang X, Tu Z, Xiong D, et al. Translating phrases in neural machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2017: 1421-1431. [6] Chen H, Huang S, Chiang D, et al. Improved neural machine translation with a syntax-aware encoder and decoder[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 1936-1945. [7] Zhang M, Li Z, Fu G, et al. Syntax-enhanced neural machine translation with syntax-aware word representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 1151-1161. [8] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of 31st International Conference on Neural Information Processing Systems, 2017: 5998-6008. [9] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks[C]//Proceedings of 27th International Conference on Neural Information Processing Systems, 2014: 3104-3112. [10] Srivastava N,Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from over fitting[J]. Journal of Machine Learning Research, 2014, 15(56):1929-1958. [11] Kingma D P, Ba J L. Adam: A method for stochastic optimization[C]//Proceedings of the International Conference on Learning Representations, 2015. [12] Radford A, Narasimhan K,Salimans T, et al. Improving language understanding by generative pretraining[OL]. https://www.bilibili.com/video/av61806172/[2019-08-12]. [13] Koehn P. Statistical significance tests for machine translation evaluation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2004: 388-395.