神经机器翻译在资源丰富语言对中取得良好性能,但这种性能的取得通常以大规模的平行语料为前提。在民族语言与汉语之间仅存在小规模双语平行句对的情况下,该文提出把机器翻译中的数据增强技术融入多任务学习框架提升翻译性能。首先,通过对目标端句子进行简单的变换(如词序调整、词替换等)以产生非准确的新句子增强噪声;其次,将上述扩增的伪平行语料作为辅助任务融入一个多任务学习框架中以充分训练编码器,并使神经网络将注意力转移到如何使编码器中的源语言句子拥有更丰富准确的表示。通过在全国机器翻译大会(CCMT 2021)蒙汉、藏汉以及维汉3种机器翻译评测数据集上进行6个方向的互译实验,结果表明,在上述民汉翻译任务上,该文方法均显著优于基线系统及多种常见的机器翻译数据增强方法。
Abstract
Neural machine translation achieves good performance in language pairs with a large parallel corpus. To deal with the fact that small bilingual parallel sentence pairs between minority langurages and Chinese, this paper proposes to implement the data augmentation into a multi-task learning framework. First, the simple transformations are performed on the target sentence, such as word order adjustment, word substitution, to produce new sentence pairs. Second, the above augmented pseudo-parallel corpus are introduced as auxiliary tasks into a multi-task learning framework to fully train the encoder, and masking the neural network pay its attertion to how to generate a richer and more accurate representation of the source language sentences in the encoder. Experiments on the CCMT 2021 dataset of Mongolian-Chinese, Tibetan-Chinese,Uyghur-Chinese, and the reverse direction show consistent improvements over the common data augmentation methods in machine translation.
关键词
多任务学习 /
数据增强 /
低资源机器翻译
{{custom_keyword}} /
Key words
multi-task learning /
data augmentation /
low-resource machine translation
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[C]//Proceedings of the 3rd International Conference on Learning Representations, 2015: 1-15.
[2] SENNRICH R, HADDOW B, BIRCH A. Improving neural machine translation models with monolingual data[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 86-96.
[3] FADAEE M, BISAZZA A, MONZ C. Data augmentation for low-resource neural machine translation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 567-573.
[4] WANG X Y, PHAM H, DAI Z H, et al. SwitchOut: An efficient data augmentation algorithm for neural machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 856-861.
[5] GAO F, ZHU J H, WU L J, et al. Soft contextual data augmentation for neural machine translation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 5539-5544.
[6] LEE K, FIRAT O, AGARWAL A, et al. Hallucinations in neural machine translation[C]//Proceedings of the Conference on Neural Information Processing Systems, 2018: 1-18.
[7] DONG D X, WU H, HE W, et al. Multi-task learning for multiple language translation[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, 2015: 1723-1732.
[8] VOITA E, SENNRICH R, TITOV I. Analyzing the source and target contributions to predictions in neural machine translation[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 2021: 1126-1140.
[9] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[C]//Proceedings of the Conference on Neural Information Processing Systems Deep Learning Workshop, 2014: 1-9.
[10] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]//Proceedings of the Conference on Neural Information Processing Systems, 2012: 1-9.
[11] ARTETXE M, LABAKA G, AGIRRE E, et al. Unsupervised neural machine translation[C]//Proceedings of the 6th International Conference on Learning Representations, 2018: 1-12.
[12] IYYER M, MANJUNATHA V, BOYD GRABER J, et al. Deep unordered composition rivals syntactic methods for text classification[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, 2015: 1681-1691.
[13] XIE Z A, WANG S I, LI J W, et al. Data noising as smoothing in neural network language models[C]//Proceedings of the 5th International Conference on Learning Representations, 2017: 1-12.
[14] RICH C. Multitask learning[J]. Machine Learning, 1997: 28(1): 41-75.
[15] HOWARD J, RUDER S. Universal language model fine-tuning for text classification[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 328-339.
[16] DOMHAN T, HIEBER F. Using target-side monolingual data for neural machine translation through multi-task learning[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017: 1500-1505.
[17] CURREY A, BARONE A V M, HEAFIELD K. Copied monolingual data improves low-resource neural machine translation[C]//Proceedings of the 2nd Conference on Machine Translation, 2017: 148-156.
[18] SEBASTIAN R. An overview of multi-task learning in deep neural networks[J]. arXiv preprint arXiv: 1706.05098, 2017.
[19] SANH V, WOLF T, RUDER S. A hierarchical multi-task approach for learning embeddings from semantic tasks[C]//Proceedings of the Conference on Artificial Intelligence, 2019: 1-8.
[20] ANTONIOS E, PLATANIOS, STRETCU O, et al. Competence-based curriculum learning for neural machine translation[C]//Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2019: 1-11.
[21] 龙从军, 刘汇丹.藏文自动分词的理论与方法研究[M]. 北京: 知识产权出版社, 2016.
[22] SENNRICH R, HADDOW B, ALEXANDRA B A. Neural machine translation of rare words with subword units[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 1715-1725.
[23] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
[24] PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002: 311-318.
[25] NGUYEN X, JOTY S, KUI W, et al. Data diversification: A simple strategy for neural machine translation[C]//Proceedings of the Conference on Neural Information Processing Systems, 2020: 1-18.
[26] LIU Q, KUSNER M, BLUNSOM P. Counterfactual data augmentation for neural machine translation[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, 2021: 187-197.
[27] CHENG Y, TU Z P, MENG F, et al. Towards robust neural machine translation[J]. arXiv preprint arXiv: 1805.06130, 2018.
[28] BINDER A, MONTAVON G, BACH S, et al. Layer-wise relevance propagation for neural networks with local renormalization layers[J]. arXiv preprint arXiv: 1604.00825, 2016.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家语委重点项目(ZDI135-118);中央民族大学国家安全研究专项项目(2022GJAQ03);中央民族大学研究生科研实践项目(BZKY2021062)
{{custom_fund}}