非自回归神经机器翻译综述

曹航,胡驰,肖桐,王成龙,朱靖波

PDF(3987 KB)
PDF(3987 KB)
中文信息学报 ›› 2023, Vol. 37 ›› Issue (11) : 1-14.
综述

非自回归神经机器翻译综述

  • 曹航,胡驰,肖桐,王成龙,朱靖波
作者信息 +

A Survey of Non-Autoregressive Neural Machine Translation

  • CAO Hang, HU Chi, XIAO Tong, WANG Chenglong, ZHU Jingbo
Author information +
History +

摘要

当前的神经机器翻译系统大多采用自回归的方式进行解码,这种串行解码的方式导致解码效率低下。与之相比,非自回归的方式通过并行解码显著提高了推断速度,受到研究人员的广泛关注。然而,由于缺乏目标序列内词语间的依赖关系,非自回归方法在性能上还存在较大差异。近些年,有很多工作研究如何缩小非自回归机器翻译(NART)和自回归机器翻译(ART)之间的翻译质量差距,但是目前缺少对现有方法和研究趋势的梳理。该文不仅从捕获依赖关系的角度对NART方法进行了详细分类和总结,而且对NART研究面临的挑战进行了展望与分析,并归纳整理了相关的论文,还进一步根据方法、发表会议和任务等进行了分类。

Abstract

Most of the current machine translation systems adopt the autoregressive method for decoding, which leads to low inference efficiency. The non-autoregressive method significantly improves the inference speed through parallel decoding, attracting increasing research interest. We conduct a systematic survey for recent efforts to narrow the translation quality gap between Non-Autoregressive Machine Translation (NART) and Autoregressive Machine Translation (ART). We categorize NART methods by the way to capture the dependencies of target sequences. We also discuss the challenges of NART research.

关键词

自然语言处理 / 非自回归 / 机器翻译

Key words

natural language processing / non-autoregressive method / machine translation

引用本文

导出引用
曹航,胡驰,肖桐,王成龙,朱靖波. 非自回归神经机器翻译综述. 中文信息学报. 2023, 37(11): 1-14
CAO Hang, HU Chi, XIAO Tong, WANG Chenglong, ZHU Jingbo. A Survey of Non-Autoregressive Neural Machine Translation. Journal of Chinese Information Processing. 2023, 37(11): 1-14

参考文献

[1] GEHRING J, AULI M, GRANGIER D, et al. Convolutional sequence to sequence learning[C]//Proceedings of the 34th International Conference on Machine Learning, 2017:1243-1252.
[2] VASWANI A, SHAZEER N M, PARMAR N, et al. Attention is all you need[C]//Proceedings of the Advances in Neural Information Processing Systems, 2017:6000-6010.
[3] GU J, BRADBURY J, XIONG C, et al. Non-autoregressive neural machine translation[C]//Proceedings of the 6th International Conference on Learning Representations, 2018.
[4] LEE J, MANSIMOV E, CHO K. Deterministic non-autoregressive neural sequence modeling by iterative refinement[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 1173-1182.
[5] SHU R, LEE J, NAKAYAMA H, et al. Latent-variable non-autoregressive neural machine translation with deterministic inference using a delta posterior[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, 2020: 8846-8853.
[6] GHAZVININEJAD M, LEVY O, LIU Y, et al. Mask-predict: Parallel decoding of conditional masked language models[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2019: 6111-6120.
[7] KASAI J, CROSS J, GHAZVININEJAD M, et al. Non-autoregressive machine translation with disentangled context transformer[C]//Proceedings of the 37th International Conference on Machine Learning, 2020: 5144-5155.
[8] 肖桐,朱靖波.机器翻译:基础与模型[M].北京:电子工业出版社, 2021:373-374.
[9] XIAO Y, WU L, GUO J,et al. A survey on non-autoregressive generation for neural machine translation and beyond[J]. arXiv preprint arXiv:2204.09269, 2022.
[10] KIM Y, RUSH A M. Sequence-level knowledge distillation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2016:1317-1327.
[11] ZHOU C, NEUBIG G, GU J. Understanding knowledge distillation in non-autoregressive machine translation[C]//Proceedings of the 8th International Conference on Learning Representations, 2020.
[12] XU W, MA S, ZHANG D, et al. How does distilled data complexity impact the quality and confidence of non-autoregressive machine translation?[C]//Proceedings of the Findings of the Association for Computational Linguistics, 2021:4392-4400.
[13] ZHOU J, KEUNG P. Improving non-autoregressive neural machine translation with monolingual data[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 1893-1898.
[14] DING L, WANG L, LIU X, et al. Understanding and improving lexical choice in non-autoregressive translation[C]//Proceedings of the 9th International Conference on Learning Representations, 2021.
[15] DING L, WANG L, LIU X, et al. Rejuvenating low-frequency words: Making the most of parallel data in non-autoregressive translation[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 2021: 3431-3441.
[16] SHAO C, WU X, FENG Y. One reference is not enough: Diverse distillation with reference selection for non-autoregressive translation[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, 2022:3779-3791.
[17] HUANG F, TAO T, ZHOU H,et al. On the learning of non-autoregressive transformers[C]//Proceedings of the International Conference on Machine Learning, 2022:9356-9376.
[18] LEE J, SHU R, CHO K. Iterative refinement in thecontinuous space for non-autoregressive neural machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2020:1006-1015.
[19] STERN M, CHAN W, KIROS J R, et al. Insertion transformer: Flexible sequence generation via insertion operations[C]//Proceedings of the 36th International Conference on Machine Learning, 2019: 5976-5985.
[20] GU J, WANG C, ZHAO J, et al. Levenshtein transformer[C]//Proceedings of the Advances in Neural Information Processing Systems, 2019:11179-11189.
[21] DEVLIN J, CHANG M, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, 2019: 4171-4186.
[22] GHAZVININEJAD M, LEVY O, ZETTLEMOYER L. Semi-autoregressive training improves mask-predict decoding[J]. arXiv preprint arXiv:2001.08785, 2020.
[23] GUO J, XU L, CHEN E. Jointly masked sequence-to-sequence model for non-autoregressive neural machine translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 376-385.
[24] DING L, WANG L, WU D, et al. Context-aware cross-attention for non-autoregressive translation[C]//Proceedings of the 28th International Conference on Computational Linguistics, 2020:4396-4402.
[25] NOROUZI S, HOSSEINZADEH R, PREZ F, et al. DiMS: Distilling multiple steps of iterative non-autoregressive transformers[J]. arXiv preprint arXiv:2206.02999, 2022.
[26] GENG X, FENG X, QIN B. Learning to rewrite for non-autoregressive neural machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2021:3297-3308.
[27] KASAI J, PAPPAS N, PENG H, et al. Deep encoder, shallow decoder: Reevaluating the speed-quality tradeoff in machine translation[J].arXiv preprint arXiv:2006.10369, 2020.
[28] KAISER , ROY A, VASWANI A, et al. Fast decoding in sequence models using discrete latent variables[C]//Proceedings of the 35th International Conference on Machine Learning, 2018: 2395-2404.
[29] AKOURY N, KRISHNA K, IYYER M. Syntactically supervised transformers for faster neural machine translation[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019: 1269-1281.
[30] BAO Y, HUANG S, XIAO T, et al. Non-autoregressive translation by learning target categorical codes[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, 2021:5749-5759.
[31] ROY A, VASWANI A, NEELAKANTAN A,et al. Theory and experiments on vector quantized autoencoders[J]. arXiv preprint arXiv:1805.11063, 2018.
[32] BAO Y, ZHOU H, FENG J, et al. Non-autoregressive transformer by position learning[J]. arXiv preprint arXiv:1911.10677, 2019.
[33] RAN Q, LIN Y, LI P, et al. Guiding non-autoregressive neural machine translation decoding with reordering information[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021: 13727-13735.
[34] SONG J, KIM S, YOON S. AligNART: Non-autoregressive neural machine translation by jointly learning to estimate alignment and translate[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2021:1-14.
[35] MA X, ZHOU C, LI X, et al. FlowSeq: Non-autoregressive conditional sequence generation with generative flow[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2019: 4281-4291.
[36] REZENDE D J, MOHAMED S. Variational inference with normalizing flows[C]//Proceedings of the 32nd International Conference on Machine Learning, 2015:1530-1538.
[37] LIBOVICKY` J, HELCL J. End-to-end non-autoregressive neural machine translation with connectionist temporal classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 3016-3021.
[38] SAHARIA C, CHAN W, SAXENA S, et al. Non-autoregressive machine translation with latent alignments[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2020:1098-1108.
[39] WANG Y, TIAN F, HE D, et al. Non-autoregressivemachine translation with auxiliary regularization[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 2019: 5377-5384.
[40] SUN Z, LI Z, WANG H, et al. Fast structured decoding for sequence models[C]//Proceedings of the Advances in Neural Information Processing Systems, 2019: 3011-3020.
[41] LIU Y, WAN Y, ZHANG J,et al. Enriching non-autoregressive transformer with syntactic and semantic structures for neural machine translation[C]//Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 2021:1235-1244.
[42] SHAO C, FENG Y, ZHANG J, et al. Retrieving sequential information for non-autoregressive neural machine translation[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019: 3013-3024.
[43] SHAO C, ZHANG J, FENG Y, et al. Minimizing the bag-of-ngrams difference for non-autoregressive neural machine translation[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, 2020: 198-205.
[44] SHAO C, FENG Y, ZHANG J,et al. Sequence-level training for non-autoregressive neural machine translation[C]//Proceedings of the Computational Linguistics, 2021:891-925.
[45] GHAZVININEJAD M, KARPUKHIN V, ZETTLEMOYER L, et al. Aligned cross entropy for non-autoregressive machine translation[C]//Proceedings of the 37th International Conference on Machine Learning, 2020: 3515-3523.
[46] DU C, TU Z, JIANG J. Order-agnostic cross entropy for non-autoregressive machine translation[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 2849-2859.
[47] WEI B, WANG M, ZHOU H, et al. Imitation learning for non-autoregressive neural machine translation[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019:1304-1312.
[48] LI Z, LIN Z, HE D, et al. Hint-based training for non-autoregressive machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2019: 5707-5712.
[49] TU L, PANG R, WISEMAN S, et al. ENGINE: Energy-Based inference networks for non-autoregressive machine translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 2819-2826.
[50] GUO J, TAN X, XU L, et al. Fine-tuning by curriculum learning for non-autoregressive neural machine translation[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2020: 7839-7846.
[51] LIU J, REN Y, TAN X, et al. Task-level curriculum learning for non-autoregressive neural machine translation[C]//Proceedings of the 29th International Joint Conference on Artificial Intelligence, 2020: 3861-3867.
[52] SUN Z, YANG Y. An EM Approach to Non-autoregressive Conditional Sequence Generation[C]//Proceedings of the 37th International Conference on Machine Learning, 2020: 9249-9258.
[53] HAO Y, HE S, JIAO W, et al. Multi-task learning with shared encoder for non-autoregressive machine translation[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, 2021: 3989-3996.
[54] GUO J, TAN X, HE D, et al. Non-autoregressive neural machine translation with enhanced decoder input[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 2019: 3723-3730.
[55] ZHAN J, CHEN Q, CHEN B, et al. Non-autoregressive translation with dependency-aware decoder[J]. arXiv preprint arXiv:2203.16266, 2022.
[56] LI X, MENG Y, YUAN A, et al. LAVA NAT: A non-autoregressive translation model with look-around decoding and vocabulary attention[J]. arXiv preprint arXiv:2002.03084, 2020.
[57] DENG Y, RUSH A M. Sequence-to-lattice models for fast translation[C]//Proceedings of the Association for Computational Linguistics, 2021:3765-3772.
[58] ZHENG Z, ZHOU H, HUANG S, et al. Duplex sequence-to-sequence learning for reversible machine translation[C]//Proceedings of the Advances in Neural Information Processing Systems, 2021:21070-21084.
[59] WANG C, ZHANG J, CHEN H. Semi-autoregressive neural machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018:479-488.
[60] RAN Q, LIN Y, LI P, et al. Learning to recover from multi-modality errors for non-autoregressive neural machine translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 3059-3069.
[61] KONG X, ZHANG Z, HOVY E H. Incorporating a local translation mechanism into non-autoregressive translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2020:1067-1073.
[62] DENG Y, RUSH A M. Cascaded text generation with markov transformers[C]//Proceedings of the Advances in Neural Information Processing Systems, 2020.
[63] QIAN L, ZHOU H, BAO Y, et al. Glancing transformer for non-autoregressive neural machine translation[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 2021: 1993-2003.
[64] DING L, WANG L, LIU X, et al. Progressive multi-granularity training for non-autoregressive translation[C]//Proceedings of the Association for Computational Linguistics, 2021: 2797-2803.
[65] XIE P, LI Z, HU X. MvSR-NAT: Multi-view subset regularization for non-autoregressive machine translation[J]. arXiv preprint arXiv:2108.08447, 2021.
[66] ZENG C, CHEN J, ZHUANG T, et al. Neighbors are not strangers: Improving non-autoregressive translation under low-frequency lexical constraints[J]. arXiv preprint arXiv:2204.13355, 2022.
[67] BAO Y, ZHOU H, HUANG S, et al. GLAT: Glancing at latent variables for parallel text generation[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022:8398-8409.
[68] HUANG F, ZHOU H, LIU Y, et al. Directed acyclic transformer for non-autoregressive machine translation[C]//Proceedings of the International Conference on Machine Learning, 2022:9410-9428.
[69] HUANG X S, PEREZ F, VOLKOVS M. Improving non-autoregressive translation models without distillation[C]//Proceedings of the International Conference on Learning Representations, 2022.
[70] GU J, KONG X. Fully non-autoregressive neural machine translation: Tricks of the trade[C]//Proceedings of the Association for Computational Linguistics, 2021: 120-133.
[71] HELCL J, HADDOW B, BIRCH A. Non-autoregressive machine translation: Its not as fast as it seems[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, 2022:1780-1790.
[72] REN Y, RUAN Y, TAN X, et al. FastSpeech: Fast, robust and controllable text to speech[C]//Proceedings of the Advances in Neural Information Processing Systems, 2019:3165-3174.
[73] PENG K, PING W, SONG Z, et al. Non-autoregressive neural text-to-speech[C]//Proceedings of the 37th International Conference on Machine Learning, 2020: 7586-7598.
[74] CHEN N, WATANABE S, VILLALBA J, et al. Listen and fill in the missing letters: non-autoregressive transformer for speech recognition[J]. arXiv preprint arXiv:1911.04908, 2019.
[75] FUJITA Y, WATANABE S, OMACHI M, et al. Insertion-based modeling for end-to-end automatic speech recognition[C]//Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020: 3660-3664.
[76] HIGUCHI Y, WATANABE S, CHEN N, et al. Mask CTC: Non-autoregressive end-to-end ASR with CTC and mask predict[C]//Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020: 3655-3659.
[77] FEI Z. Fast image caption generation with position alignment[J]. arXiv preprint arXiv:1912.06365, 2019.
[78] GAO J, MENG X, WANG S,et al. Masked non-autoregressive image captioning[J]. arXiv preprint arXiv:1906.00717, 2019.
[79] SUSANTO R H, CHOLLAMPATT S, TAN L. Lexically constrained neural machine translation with levenshtein transformer[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020:3536-3543.
[80] LI P, LI L, ZHANG M,et al. Universal conditional masked language pre-training for neural machine translation[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2022.
[81] ZHANG Y, WANG G, LI C,et al. Pointer: Constrained text generation via insertion-based generative pre-training[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2020:8649-8670.

基金

国家自然科学基金(61876035,61732005);科技部科技创新2030-“新一代人工智能”重大项目(2020AAA0107904);云南省科技厅科技计划项目(202002AD080001,202103AA080015);中央高校基本科研业务专项资金资助
PDF(3987 KB)

Accesses

Citation

Detail

段落导航
相关文章

/