低频词表示增强的低资源神经机器翻译

朱俊国,杨福岸,余正涛,邹翔,张泽锋

PDF(1941 KB)
PDF(1941 KB)
中文信息学报 ›› 2022, Vol. 36 ›› Issue (6) : 44-51.
机器翻译

低频词表示增强的低资源神经机器翻译

  • 朱俊国1,2,杨福岸1,2,余正涛1,2,邹翔1,2,张泽锋1,2
作者信息 +

Low Resource Neural Machine Translation with Enhanced Representation of Rare Words

  • ZHU Junguo1,2, YANG Fuan1,2, YU Zhengtao1,2, ZOU Xiang1,2, ZHANG Zefeng1,2
Author information +
History +

摘要

在神经机器翻译过程中,低频词是影响翻译模型性能的一个关键因素。由于低频词在数据集中出现次数较少,训练经常难以获得准确的低频词表示,该问题在低资源翻译中的影响更为突出。该文提出了一种低频词表示增强的低资源神经机器翻译方法。该方法的核心思想是利用单语数据上下文信息来学习低频词的概率分布,并根据该分布重新计算低频词的词嵌入,然后在所得词嵌入的基础上重新训练Transformer模型,从而有效缓解低频词表示不准确问题。该文分别在汉越和汉蒙两个语言对四个方向上分别进行实验,实验结果表明,该文提出的方法相对于基线模型均有显著的性能提升。

Abstract

In neural machine translation, the low-frequency word is a key factor affecting the quality of the translation output, which is more prominent in low-resource scenario. This paper proposes a low-resource neural machine translation method with enhanced the representation of low-frequency words. The main idea is to use monolingual data context information to learn the probability distribution of low-frequency words, and recalculate the word embeddings of low-frequency words based on this distribution. The Transformer model is then re-trained by the new word embeddings, thereby effectively alleviating the problem of representing low-frequency words inaccurately. The experimental results in the four directions between Chinese and Vietnamese, Chinese and Mongolian translation tasks show that the method proposed in this paper has a significant improvement over the baseline model.

关键词

低频词表示 / 信息增强 / 低资源 / 神经机器翻译

Key words

low-frequency word representation / information enhancement / low resources / neural machine translation

引用本文

导出引用
朱俊国,杨福岸,余正涛,邹翔,张泽锋. 低频词表示增强的低资源神经机器翻译. 中文信息学报. 2022, 36(6): 44-51
ZHU Junguo, YANG Fuan, YU Zhengtao, ZOU Xiang, ZHANG Zefeng. Low Resource Neural Machine Translation with Enhanced Representation of Rare Words. Journal of Chinese Information Processing. 2022, 36(6): 44-51

参考文献

[1] Chren W A. One-hot residue coding for low delay-power product CMOS design[J]. IEEE Transactions on Circuits and Systems II Analog and Digital Signal Processing, 1998, 45(3):303-313.
[2] Gao F, Zhu J H, Wu L J, et al. Soft contextual data augmentation for neural machine translation[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 5539-5544.
[3] Papineni K, Roukos S, Ward T, et al. BLEU: A method for automatic evaluation of machine translation[C]//Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, 2002: 311-318.
[4] Mi H, Wang Z, Ittycheriah A. Vocabulary manipulation for neural machine translation[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 124-129.
[5] Wang X, Tu Z, Xiong D, et al. Translating phrases in neural machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2017: 1421-1431.
[6] Chen H, Huang S, Chiang D, et al. Improved neural machine translation with a syntax-aware encoder and decoder[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 1936-1945.
[7] Zhang M, Li Z, Fu G, et al. Syntax-enhanced neural machine translation with syntax-aware word representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 1151-1161.
[8] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of 31st International Conference on Neural Information Processing Systems, 2017: 5998-6008.
[9] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks[C]//Proceedings of 27th International Conference on Neural Information Processing Systems, 2014: 3104-3112.
[10] Srivastava N,Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from over fitting[J]. Journal of Machine Learning Research, 2014, 15(56):1929-1958.
[11] Kingma D P, Ba J L. Adam: A method for stochastic optimization[C]//Proceedings of the International Conference on Learning Representations, 2015.
[12] Radford A, Narasimhan K,Salimans T, et al. Improving language understanding by generative pretraining[OL]. https://www.bilibili.com/video/av61806172/[2019-08-12].
[13] Koehn P. Statistical significance tests for machine translation evaluation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2004: 388-395.

基金

国家自然科学基金(61732005, 62166022, 61866020);云南省科技厅面上项目(202101AT076077);云南省人培项目(KKSY201903018)
PDF(1941 KB)

2016

Accesses

0

Citation

Detail

段落导航
相关文章

/