神经机器翻译前沿综述

冯洋,邵晨泽

PDF(6554 KB)
PDF(6554 KB)
中文信息学报 ›› 2020, Vol. 34 ›› Issue (7) : 1-18.
综述

神经机器翻译前沿综述

  • 冯洋1,2,邵晨泽1,2
作者信息 +

Frontiers in Neural Machine Translation: A Literature Review

  • FENG Yang1,2, SHAO Chenze1,2
Author information +
History +

摘要

机器翻译是指通过计算机将源语言句子翻译到与之语义等价的目标语言句子的过程,是自然语言处理领域的一个重要研究方向。神经机器翻译仅需使用神经网络就能实现从源语言到目标语言的端到端翻译,目前已成为机器翻译研究的主流方向。该文选取了近期神经机器翻译的几个主要研究领域,包括同声传译、多模态机器翻译、非自回归模型、篇章翻译、领域自适应、多语言翻译和模型训练,并对这些领域的前沿研究进展做简要介绍。

Abstract

Machine translation is a task which translates a source language into a target language of the equivalent meaning via a computer, which has become an important research direction in the field of natural language processing. Neural machine translation models, as the main stream in the reasearch community, can perform end-to-end translation from source language to target language. In this paper, we select several main research directions of neural machine translation, including model training, simultaneous translation, multi-modal translation, non-autoregressive translation, document-level translation, domain adaptation, multilingual translation, and briefly introduce the research progresses in these directions.

关键词

神经机器翻译 / 模型训练 / 同声传译 / 多模态机器翻译 / 非自回归机器翻译 / 篇章翻译 / 领域自适应 / 多语言翻译

Key words

neural machine translation / model training / simultaneous translation / multi-modal translation / non-autoregressive translation / document-level translation / domain adaptation / multilingual translation

引用本文

导出引用
冯洋,邵晨泽. 神经机器翻译前沿综述. 中文信息学报. 2020, 34(7): 1-18
FENG Yang, SHAO Chenze. Frontiers in Neural Machine Translation: A Literature Review. Journal of Chinese Information Processing. 2020, 34(7): 1-18

参考文献

[1] Brown P F, Cocke J, Della Pietra S A, et al. A statistical approach to machine translation[J]. Computational Linguistics, 1990, 16(2): 79-85.
[2] Brown P F, Pietra V J D, Pietra S A D, et al. The mathematics of statistical machine translation: Parameter estimation[J]. Computational Linguistics, 1993, 19(2): 263-311.
[3] Kalchbrenner N, Blunsom P. Recurrent continuous translation models[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013: 1700-1709.
[4] Sutskever I, Vinyals O, Le Q V.Sequence to sequence learning with neural networks[C]//Procedings of Advances in Neural Information Processing Systems, 2014: 3104-3112.
[5] Cho K, Gulcehre B M C, Bahdanau D, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv preprint arXiv: 1406.1078, 2014.
[6] Cho K, van Merrienboer B, Bahdanau D, et al. On the properties of neural machine translation: encoder-decoder approaches[C]//Proceedings of SSST-8, 8th Workshop on Syntax, Semantics and Structure in Statistical Translation, 2014: 103-111.
[7] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv: 1409.0473, 2014.
[8] Junczys-Dowmunt M, Dwojak T, Hoang H.Is neural machine translation ready for deployment? A case study on 30 translation directions[J]. arXiv preprint arXiv: 1610.01108, 2016.
[9] Wu Y, Schuster M, Chen Z, et al. Google's neural machine translation system: Bridging the gap between human and machine translation[J]. arXiv preprint arXiv: 1609.08144, 2016.
[10] Gehring J, Auli M, Grangier D, et al. Convolutional sequence to sequence learning[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017: 1243-1252.
[11] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Procedings of Advances in Neural Information Processing Systems, 2017: 5998-6008.
[12] Hassan H, Aue A, Chen C, et al. Achieving human parity on automatic chinese to english news translation[J]. arXiv preprint arXiv: 1803.05567, 2018.
[13] Wu F, Fan A, Baevski A, et al. Pay less attention with lightweight and dynamic convolutions[J]. arXiv preprint arXiv: 1901.10430, 2019.
[14] Dehghani M, Gouws S, Vinyals O, et al. Universal transformers[J]. arXiv preprint arXiv: 1807.03819, 2018.
[15] So D, Le Q, Liang C. The evolved transformer[C]//Proceedings of the 36th International Conference on Machine Learning, 2019: 5877-5886.
[16] Meng F, Zhang J. DTMT: A novel deep transition architecture for neural machine translation[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33: 224-231.
[17] Wang Q, Li B, Xiao T, et al. Learning deep transformer models for machine translation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 1810-1822.
[18] Chen K, Wang R, Utiyama M, et al. Neural machine translation with reordering embeddings[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 1787-1799.
[19] Dou Z Y, Tu Z, Wang X, et al.Dynamic layer aggregation for neural machine translation with routing-by-agreement[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33: 86-93.
[20] Sennrich R, Haddow B, Birch A. Improving meural machine translation models with monolingual data[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 86-96.
[21] Edunov S, Ott M, Auli M, et al. Understanding back-translation at scale[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 489-500.
[22] Junczys-Dowmunt M. Microsoft's submission to the WMT2018 news translation task: How I learned to stop worrying and love the data[C]//Proceedings of the 3rd Conference on Machine Translation: Shared Task Papers, 2018: 425-430.
[23] Junczys Dowmunt M. Dual conditional cross-entropy filtering of noisy parallel corpora[C]//Proceedings of the 3rd Conference on Machine Translation: Shared Task Papers, 2018: 888-895.
[24] Song K, Tan X, Qin T, et al. MASS: masked sequence to sequence pre-training for language generation[C]//Proceedings of the 36th International Conference on Machine Learning, 2019: 5926-5936.
[25] Meng F, Lu Z, Wang M, et al. Encoding source language with convolutional neural network for machine translation[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 2015: 20-30.
[26] Gehring J, Auli M, Grangier D, et al. A convolutional encoder model for neural machine translation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 123-135.
[27] Dauphin Y N, Fan A, Auli M, et al. Language modeling with gated convolutional networks[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017: 933-941.
[28] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[29] Ba J L, Kiros J R, Hinton G E. Layer normalization[J]. arXiv preprint arXiv: 1607.06450, 2016.
[30] Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th annual meeting on association for Computational Linguistics. Association for Computational Linguistics, 2002: 311-318.
[31] Cho K, Esipova M. Can neural machine translation do simultaneous translation?[J]. arXiv preprint arXiv: 1606.02012, 2016.
[32] Gu J, Neubig G, Cho K, et al. Learning to translate in real-time with neural machine translation[C]//Proceedings of 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017. Association for Computational Linguistics (ACL), 2017: 1053-1062.
[33] Ma M, Huang L, Xiong H, et al. STACL: Simultaneous translation with implicit anticipation and controllable latency using prefix-to-prefix framework[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 3025--3036.
[34] Arivazhagan N, Cherry C, Macherey W, et al. Monotonic infinite lookback attention for simultaneous machine translation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 1313-1323.
[35] Dalvi F, Durrani N, Sajjad H, et al. Incremental decoding and training methods for simultaneous translation in neural machine translation[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 493-499.
[36] Zheng B, Zheng R, Ma M, et al. Simultaneous translation with flexible policy via restricted imitation learning[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 5816-5822.
[37] Zheng B, Zheng R, Ma M, et al. Simpler and faster learning of adaptive policies for simultaneous translation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 1349-1354.
[38] Ma X, Pino J, Cross J, et al. Monotonic multihead attention[J]. arXiv preprint arXiv: 1909.12406, 2019.
[39] Alinejad A, Siahbani M, Sarkar A. Prediction improves simultaneous neural machine translation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 3022-3027.
[40] Zheng R, MaM, Zheng B, et al. Speculative beam search for simultaneous translation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 1395-1402.
[41] Specia L, Frank S, Sima'an K, et al. A shared task on multimodal machine translation and crosslingual image description[C]//Proceedings of the 1st Conference on Machine Translation: Volume 2, Shared Task Papers, 2016: 543-553.
[42] Elliott D, Frank S, Barrault L, et al. Findings of the second shared task on multimodal machine mranslation and multilingual image description[C]//Proceedings of the 2nd Conference on Machine Translation, 2017: 215-233.
[43] Barrault L, Bougares F, Specia L, et al. Findings of the third shared task on multimodal machine translation[C]//Proceedings of the 3rd Conference on Machine Translation: Shared Task Papers, 2018: 304-323.
[44] Caglayan O, Aransa W, Bardet A, et al. LIUM-CVC submissions for WMT17 multimodal translation task[C]//Proceedings of the 2nd Conference on Machine Translation, 2017: 432-439.
[45] Libovicky J,Helcl J, Tlusty M, et al. CUNI system for WMT16 automatic post-editing and multimodal translation tasks[C]//Proceedings of the 1st Conference on Machine Translation: Volume 2, Shared Task Papers, 2016: 646-654.
[46] Calixto I, Liu Q, Campbell N. Incorporating global visual features into attention-based neural machine translation[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017: 992-1003.
[47] Huang P Y, Liu F, Shiang S R, et al. Attention-based multimodal neural machine translation[C]//Proceedings of the 1st Conference on Machine Translation: Volume 2, Shared Task Papers, 2016: 639-645.
[48] Elliott D, Frank S, Hasler E. Multilingual image description with neural sequence models[J]. arXiv preprint arXiv: 1510.04709, 2015.
[49] Madhyastha P S, Wang J, Specia L. Sheffield multiMT: using object posterior predictions for multimodal machine translation[C]//Proceedings of the 2nd Conference on Machine Translation, 2017: 470-476.
[50] Caglayan O, Barrault L, Bougares F. Multimodal attention for neural machine translation[J]. arXiv preprint arXiv: 1609.03976, 2016.
[51] Calixto I, Liu Q, Campbell N. Doubly-attentive decoder for multi-modal neural machine translation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 1913-1924.
[52] Delbrouck J B, Dupont S. An empirical study onthe effectiveness of images in multimodal neural machine translation[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017: 910-919.
[53] Libovicky J, Helcl J. Attention strategies for multi-source sequence-to-sequence learning[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 196-202.
[54] Zhou M, Cheng R, Lee Y J, et al. A visual attention grounding neural model for multimodal machine translation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 3643-3653.
[55] Ive J, Madhyastha P, Specia L. Distilling translations with visual awareness[J]. arXiv preprint arXiv: 1906.07701, 2019.
[56] Toyama J, Misono M, Suzuki M, et al. Neural machine translation with latent semantic of image and text[J]. arXiv preprint arXiv: 1611.08459, 2016.
[57] Calixto I, Rios M, Aziz W. Latent variable model for multi-modal translation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 6392-6405.
[58] Pengcheng Y, Boxing C, Pei Z, et al. Visual agreement regularized training for multi-modal machine translation[J]. arXiv preprint arXiv: 1912.12014, 2019.
[59] Gu J, Bradbury J, Xiong C, et al. Non-autoregressive neural machine translation[J]. arXiv preprint arXiv: 1711.02281, 2017.
[60] Shao C, Feng Y, Zhang J, et al. Retrieving sequential information for non-autoregressive neural machine translation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 3013-3024.
[61] Kim Y, Rush A M. Sequence-level knowledge distillation[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016: 1317-1327.
[62] Zhou C, Neubig G, Gu J. Understanding knowledge distillation in non-autoregressive machine translation[J]. arXiv preprint arXiv: 1911.02727, 2019.
[63] Wei B, Wang M, Zhou H, et al.Imitation learning for non-autoregressive neural machine translation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 1304-1312.
[64] Guo J, Tan X, Xu L, et al. Fine-tuning by curriculum learning for non-autoregressive neural machine translation[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020.
[65] Li Z,Lin Z, He D, et al. Hint-based training for non-autoregressive translation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 5708-5713.
[66] Kaiser L, Bengio S, Roy A, et al. Fast decoding in sequence models using discrete latent variables[C]//Proceedings of the 35th International Conference on Machine Learning, 2018: 2395-2404.
[67] Roy A, Vaswani A, Neelakantan A, et al. Theory and experiments on vector quantized autoencoders[J]. arXiv preprint arXiv: 1805.11063, 2018.
[68] Ma X, Zhou C, Li X, et al. FlowSeq: Non-autoregressive conditional sequence generation with generative flow[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 4273-4283.
[69] Shu R, Lee J, Nakayama H, et al. Latent-variable non-autoregressive neural machine translation with deterministic inference using a delta posterior[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020.
[70] Kingma D P, Welling M. Auto-encoding variational bayes[J]. arXiv preprint arXiv: 1312.6114, 2013.
[71] Bao Y, Zhou H, Feng J, et al. Non-autoregressive transformer by position learning[J]. arXiv preprint arXiv: 1911.10677, 2019.
[72] Ran Q, Lin Y, Li P, et al. Guiding non-autoregressive neural machine translation decoding with reordering information[J]. arXiv preprint arXiv: 1911.02215, 2019.
[73] Wang Y, Tian F, He D, et al. Non-autoregressive machine translation with auxiliary regularization[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020.
[74] Sun Z, Li Z, Wang H, et al. Fast structured decoding for sequence models[C]//Procedings of Advances in Neural Information Processing Systems, 2019: 3011-3020.
[75] Libovick〖KG-*4〗y〖DD(-1*3〗〖HT6〗'〖DD)〗 J, Helcl J. End-to-end non-autoregressive neural machine translation with connectionist temporal classification[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 3016-3021.
[76] Shao C, Zhang J, Feng Y, et al. Minimizing the bag-of-ngrams difference for non-autoregressive neural machine translation[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020.
[77] Lee J, Mansimov E, Cho K. Deterministic non-autoregressive neural sequence modeling by iterative refinement[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 1173-1182.
[78] Ghazvininejad M, LevyO, Liu Y, et al. Mask-predict: parallel decoding of conditional masked language models[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 6114-6123.
[79] Gu J, Wang C, Zhao J. Levenshtein transformer[C]//Procedings of the Neural Information Processing Systems, 2019: 11181--11191.
[80] VoitaE, Sennrich R, Titov I. When a good translation is wrong in context: context-aware machine translation improves on deixis, ellipsis, and lexical cohesion[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 1198-1212.
[81] Jean S, Lauly S, Firat O, et al. Does neural machine translation benefit from larger context?[J]. arXiv preprint arXiv: 1704.05135, 2017.
[82] Bawden R, Sennrich R, Birch A, et al. Evaluating discourse phenomena in neural machine translation[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 1304-1313.
[83] Wang L,Tu Z, Way A, et al. Exploiting cross-sentence context for neural machine translation[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017: 2826-2831.
[84] Voita E, Serdyukov P, Sennrich R, et al. Context-aware neural machine translation learns anaphora resolution[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 1264-1274.
[85] Zhang J, Luan H, Sun M, et al. Improving the transformer translation model with document-level context[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 533-542.
[86] Miculicich L, Ram D, Pappas N, et al. Document-level neural machine translation with hierarchical attention networks[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 2947-2954.
[87] Maruf S, Martins A F T, Haffari G. Selective attention for context-aware neural machine translation[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 3092-3102.
[88] Yang Z, Zhang J, Meng F, et al. Enhancing context modeling with a query-guided capsule network for document-level translation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 1527-1537.
[89] Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules[C]//Procedings of Advances in Neural Information Processing Systems, 2017: 3856-3866.
[90] Sabour S, Frosst N, Hinton G. Matrix capsules with EM routing[C]//Proceedings of the 6th International Conference on Learning Representations. ICLR, 2018: 1-15.
[91] Tu Z, Liu Y, Shi S, et al. Learning to remember translation history with a continuous cache[J]. Transactions of the Association for Computational Linguistics, 2018, 6: 407-420.
[92] Kuang S, Xiong D, Luo W, et al. Modeling coherence for neural machine translation with dynamic and topic caches[C]//Proceedings of the 27th International Conference on Computational Linguistics, 2018: 596-606.
[93] Maruf S, Haffari G. Document context neural machine translation with memory networks[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 1275-1284.
[94] Xiong H, He Z, Wu H, et al. Modeling coherence for discourse neural machine translation[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33: 7338-7345.
[95] Bosselut A, Celikyilmaz A, He X, et al. Discourse-aware neural rewards for coherent text generation[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 173-184.
[96] Voita E, Sennrich R, Titov I. Context-aware monolingual repair for neural machine translation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 876-885.
[97] Moore R C, Lewis W. Intelligent selection of language model training data[C]//Proceedings of the ACL 2010 Conference. Association for Computational Linguistics, 2010: 220-224.
[98] Axelrod A, He X, Gao J. Domain adaptation via pseudo in-domain data selection[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011: 355-362.
[99] Wang R, Finch A, Utiyama M, et al. Sentence embedding for neural machine translation domain adaptation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 560-566.
[100] van der Wees M, Bisazza A, Monz C. Dynamic data selection for neural machine translation[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017: 1400-1410.
[101] Hu J,Xia M, Neubig G, et al. Domain adaptation of neural machine translation by lexicon induction[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 2989-3001.
[102] Chu C, Dabre R.Multilingual multi-domain adaptation approaches for neural machine translation[J]. arXiv preprint arXiv: 1906.07978, 2019.
[103] Imankulova A, DabreR, Fujita A, et al. Exploiting out-of-domain parallel data through multilingual transfer learning for low-resource neural machine translation[C]//Proceedings of Machine Translation Summit XVII Volume 1: Research Track, 2019: 128-139.
[104] Freitag M, Al-Onaizan Y. Fast domain adaptation for neural machine translation[J]. arXiv preprint arXiv: 1612.06897, 2016.
[105] Luong M T, Manning C D. Stanford neural machine translation systemsfor spoken language domains[C]//Proceedings of the International Workshop on Spoken Language Translation, 2015: 76-79.
[106] Servan C, Crego J, Senellart J.Domain specialization: A post-training domain adaptation for neural machine translation[J]. arXiv preprint arXiv: 1612.06141, 2016.
[107] Dakwale P, Monz C. Fine-tuning for neural machine translation with limited degradation across in-and out-of-domain data[C]//Proceedings of the XVI Machine Translation Summit, 2017: 117.
[108] Chu C, Dabre R, Kurohashi S. An empirical comparison of domain adaptation methods for neural machine translation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 385-391.
[109] Barone A V M, Haddow B, Germann U, et al. Regularization techniques for fine-tuning in neural machine translation[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017: 1489-1494.
[110] Wang R, Utiyama M, Liu L, et al. Instance weighting for neural machine translation domain adaptation[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017: 1482-1488.
[111] Chen B, Cherry C, Foster G, et al. Cost weighting for neural machine translation domain adaptation[C]//Proceedings of the 1st Workshop on Neural Machine Translation, 2017: 40-46.
[112] Yan S, Dahlmann L, Petrushkov P, et al. Word-based domain adaptation for neural machine translation[J]. arXiv preprint arXiv: 1906.03129, 2019.
[113] Vilar D. Learning hidden unit contribution for adapting neural machine translation models[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2, 2018: 500-505.
[114] Zhang X, Shapiro P, Kumar G, et al. Curriculum learning for domain adaptation in neural machine translation[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,, 2019: 1903-1915.
[115] Zeng J, Liu Y, Lu Y, et al. Iterative dual domain adaptation for neural machine translation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 845-854.
[116] Gulcehre C, Firat O, Xu K, et al. On using monolingual corpora in neural machine translation[J]. arXiv preprint arXiv: 1503.03535, 2015.
[117] Dou ZY, Wang X, Hu J, et al. Domain differential adaptation for neural machine translation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 59-69.
[118] Khayrallah H, Kumar G, Duh K, et al. Neural lattice search for domain adaptation in machine translation[C]//Proceedings of the 8th International Joint Conference on Natural Language Processing, 2017: 20-25.
[119] Britz D, Le Q, Pryzant R. Effective domain mixing for neural machine translation[C]//Proceedings of the 2nd Conference on Machine Translation, 2017: 118-126.
[120] Kobus C, Crego J, Senellart J. Domain control for neural machine translation[C]//Proceedings of the International Conference Recent Advances in Natural Language Processing. RANLP 2017, 2017: 372-378.
[121] Thompson B, Khayrallah H, Anastasopoulos A, et al. Freezing subnetworks to analyze domain adaptation in neural machine translation[C]//Proceedings of the 3rd Conference on Machine Translation: Research Papers, 2018: 124-132.
[122] Wuebker J, Simianer P, DeNero J. Compact personalized models for neural machine translation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 881-886.
[123] Gu S, Feng Y, Liu Q. Improving domain adaptation translation with domain invariant and specific information[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 3081-3091.
[124] Dong D, Wu H, He W, et al. Multi-task learning for multiple language translation[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 2015: 1723-1732.
[125] Luong M T, Le Q V, Sutskever I, et al. Multi-task sequence to sequence learning[J]. arXiv preprint arXiv: 1511.06114, 2015.
[126] Firat O, Cho K, Bengio Y. Multi-Way, Multilingual neural machine translation with a shared attention mechanism[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 866-875.
[127] Lee J, Cho K, Hofmann T. Fully character-level neural machine translation without explicit segmentation[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 365-378.
[128] Firat O, Sankaran B, Al-Onaizan Y, etal. Zero-resource translation with multi-lingual neural machine translation[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016: 268-277.
[129] Ha T L, Niehues J, Waibel A. Toward multilingual neural machine translation with universal encoder and decoder[J]. arXiv preprint arXiv: 1611.04798, 2016.
[130] Johnson M, Schuster M, Le Q V, et al. Google's multilingual neural machine translation system: enabling zero-shot translation[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 339-351.
[131] Lakew S M, Cettolo M, Federico M. A comparison of transformer and recurrent neural networks on multilingual neural machine translation[C]//Proceedings of the 27th International Conference on Computational Linguistics, 2018: 641-652.
[132] Blackwood G, Ballesteros M, Ward T. Multilingual neural machine translation with task-specific attention[C]//Proceedings of the 27th International Conference on Computational Linguistics, 2018: 3112-3122.
[133] Sachan D, Neubig G. Parameter sharing methods for multilingual self-attentional translation Models[C]//Proceedings of the 3rd Conference on Machine Translation: Research Papers, 2018: 261-271.
[134] Platanios E A, SachanM, Neubig G, et al. Contextual parameter generation for universal neural machine translation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 425-435.
[135] Lu Y, Keung P, Ladhak F, et al. A neural interlingua for multilingual machine translation[C]//Proceedings of the 3rd Conference on Machine Translation: Research Papers, 2018: 84-92.
[136] Wang Y, Zhang J, Zhai F, et al. Three strategies to improve one-to-many multilingual translation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 2955-2960.
[137] Wang X, Pham H, Arthur P, et al. Multilingual neural machine translation with soft decoupled encoding[J]. arXiv preprint arXiv: 1902.03499, 2019.
[138] Tan X, Ren Y, He D, et al. Multilingual neural machine translation with knowledge distillation[J]. arXiv preprint arXiv: 1902.10461, 2019.
[139] Murthy R, Kunchukuttan A, Bhattacharyya P. Addressing word-order divergence in multilingual neural machine translation for extremely low resource languages[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 3868-3873.
[140] Tan X, Chen J, He D, et al. Multilingual neural machine translation with language clustering[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 962-972.
[141] GuJ, Wang Y, Chen Y, et al. Meta-learning for low-resource neural machine translation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 3622-3631.
[142] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fastadaptation of deep networks[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017: 1126-1135.
[143] Dabre R, Nakagawa T, Kazawa H. An empirical study of language relatedness for transfer learning in neural machine translation[C]//Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation, 2017: 282-286.
[144] Neubig G, Hu J. Rapid adaptation of neural machine translation to new languages[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 875-880.
[145] Gu J,Wang Y, Cho K, et al. Improved zero-shot neural machine translation via ignoring spurious correlations[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 1258-1268.
[146] Wang X, Neubig G.Target conditioned sampling: optimizing data selection for multilingual neural machine translation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 5823-5828.
[147] Wang X, PhamH, Michel P, et al. Optimizing data usage via differentiable rewards[J]. arXiv preprint arXiv: 1911.10088, 2019.
[148] Arivazhagan N, Bapna A, Firat O, et al. Massively multilingual neural machine translation in the wild: Findings and challenges[J]. arXiv preprint arXiv: 1907.05019, 2019.
[149] Williams R J, Zipser D. A learning algorithm for continually running fully recurrent neural networks[J]. Neural Computation, 1989, 1(2): 270-280.
[150] Ranzato M A, Chopra S, Auli M, et al. Sequence level training with recurrent neural networks[J]. arXiv preprint arXiv: 1511.06732, 2015.
[151] Bengio S, Vinyals O, Jaitly N, et al. Scheduled sampling for sequence prediction with recurrent neural networks[C]//Procedings of Advances in Neural Information Processing Systems, 2015: 1171-1179.
[152] Zhang W, FengY, Meng F, et al. Bridging the gap between training and inference for neural machine translation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 4334-4343.
[153] Williams R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8(3-4): 229-256.
[154] Shen S, Cheng Y, He Z, et al. Minimum risk training for neural machine translation[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016: 1683-1692.
[155] Bahdanau D, Brakel P, Xu K, et al. An actor-critic algorithm for sequence prediction[J]. arXiv preprint arXiv: 1607.07086, 2016.
[156] He D, Xia Y, Qin T, et al. Dual learning for machine translation[C]//Procedings of Advances in Neural Information Processing Systems, 2016: 820-828.
[157] Wu L, Tian F, Qin T, et al. A study of reinforcement learning for neural machine translation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 3612-3621.
[158] Edunov S, Ott M, Auli M, et al. Classical structured prediction losses for sequence to sequence learning[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 355-364.
[159] Gu J, Cho K, Li V O K. Trainable greedy decoding for neural machine translation[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017: 1968-1978.
[160] Shao C, Chen X, Feng Y. Greedy search with probabilistic n-gram matching for neural machine translation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 4778-4784.
[161] Gu J, Im D J, Li V O K. Neural machine translationwith gumbel-greedy decoding[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
[162] Niu X, Xu W, Carpuat M. Bi-directional differentiable input reconstruction for low-resource neural machine translation[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 442-448.
[163] Xu W, Niu X, Carpuat M. Differentiable sampling with flexible reference word order for neural machine translation[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 2047-2053.
[164] Weng R, Huang S, Zheng Z, et al. Neural machine translation with word predictions[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017: 136-145.
[165] Ma S, SUN Xu, Wang Y, et al. Bag-of-words as target for neural machine translation[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 332-338.
[166] Tu Z, Liu Y, Shang L, et al. Neural machine translation with reconstruction[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2017.
[167] Yang M, Wang R, Chen K, et al. Sentence-level agreement for neural machine translation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 3076-3082.
[168] Sperber M, Niehues J, Waibel A. Toward robust neural machine translation for noisy input sequences[C]//Pcoceedigns of the International Workshop on Spoken Language Translation (IWSLT), 2017.
[169] Sano M, Suzuki J, Kiyono S. Effective adversarial regularization for neural machine translation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 204-210.
[170] ChengY, Jiang L, Macherey W. Robust neural machine translation with doubly adversarial inputs[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 4324-4333.
[171] Chousa K, Sudoh K, Nakamura S. Training neural machine translation using word embedding-based loss[J]. arXiv preprint arXiv: 1807.11219, 2018.
[172] Feng Y, Xie W, Gu S, et al. Modeling fluency and faithfulness for diverse neural machine translation[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020.
[173] Norouzi M, Bengio S, Jaitly N, et al. Reward augmented maximum likelihood for neural structured prediction[C]//Procedings of Advances in Neural Information Processing Systems, 2016: 1723-1731.
[174] Elbayad M, Besacier L, Verbeek J. Token-level and sequence-level loss smoothing for RNN language models[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 2094-2103.

基金

国家重点研发计划政府间国际科技创新合作重点专项(2017YFE0192900)
PDF(6554 KB)

4923

Accesses

0

Citation

Detail

段落导航
相关文章

/