神经机器翻译在句子级翻译任务上取得了令人瞩目的效果,但是句子级翻译的译文会存在一致性、指代等篇章问题,篇章翻译通过利用上下文信息来解决上述问题。不同于以往使用源端上下文建模的方法,该文提出了融合目标端上下文信息的篇章神经机器翻译。具体地,该文借助推敲网络的思想,对篇章源端进行二次翻译,第一次基于句子级翻译,第二次翻译参考了全篇的第一次翻译结果。基于LDC中英篇章数据集和WMT英德篇章数据集的实验结果表明,在引入较少的参数的条件下,该文方法能显著提高翻译性能。同时,随着第一次翻译(即句子级译文)质量的提升,所提方法也更有效。
Abstract
Recently neural machine translation (NMT) has achieved remarkable success in sentence-level translation. However, it still cannot resolve a wide variety of discourse phenomena, such as lexical cohesion and coreference, which can be alleviated by using context information in document-level translation. In contrast to existing studies of modeling source-side context, this paper proposes to model target-side context in document-level NMT. Specifically, motivated by deliberation networks, our approach translates source-side document twice. In the first-pass translation, it performs sentence-level translation. In the second-pass, it properly translates each sentence by modeling the target-side context which has just be generated from the first-pass translation. Experimental results on LDC Chinese-to-English and WMT English-to-German document-level translation tasks show that our approach significantly improves translation performance by introducing few parameters. Meanwhile, it is observed that the proposed approach benefits more if the performance of the first-pass translation (i.e., sentence-level NMT) is improved.
关键词
神经机器翻译 /
推敲网络 /
篇章翻译
{{custom_keyword}} /
Key words
neural machine translation /
deliberation networks /
document-level translation
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] SUTSKEVER I, VINYALS O, QUOC V L. Sequence to sequence learning with neural Networks[C]//Proceedings of the Annual Conference on Neural Information Processing Systems, 2014: 3104-3112.
[2] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[C]//Proceedings of the 3rd International Conference on Learning Representations, ICLR, 2015.
[3] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the Annual Conference on Neural Information Processing Systems, 2017: 5998-6008.
[4] TU Z, LIU Y, SHI S, et al. Learning to remember translation history with a continuous cache. [J]//Transactions of the Association for Computational Linguistics, 2018: 407-420.
[5] KUANG S, XIONG D, LUO W, et al. Cache-based document-level neural machine translation. [J]arXiv preprint arXiv:1711.11221,2017.
[6] WANG L, TU Z, WAY A, et al. Exploiting cross-sentence context for neural machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP, 2017: 2826-2831.
[7] MARUF S, HAFFARI G. Document context neural machine translation with memory networks[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL, 2018: 1275-1284.
[8] VOITA E, SERDYUKOV P, SENNRICH R, et al. Context-aware neural machine translation learns anaphora resolution[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL, 2018: 1264-1274.
[9] ZHANG J, LUAN H, SUN M, et al. Improving the transformer translation model with document-level context[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP, 2018: 533-542.
[10] MICULICICH WERLEN L, RAM D, PAPPAS N, et al. Document-Level neural machine translation with hierarchical attention networks[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP, 2018: 2947-2954.
[11] MARUF S, MARTINS F. T, HAFFARI G.Selective attention for context-aware neural machine translation[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistic, ACL, 2019: 3092-3102.
[12] TAN X, ZHANG L, XIONG D, et al. Hierarchical modeling of global context for document-level neural machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP, 2019: 1576-1585.
[13] CHEN L, LI J, GONG Z, et al. Improving context-aware neural machine translation with source-side monolingual documents[C]//Proceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI, 2021a.
[14] CHEN L, LI J, GONG Z, et al. Breaking the corpus bottleneck for context-aware neural machine translation with a novel joint pre-training approach[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP, 2021b.
[15] VOITA E, SENNRICH R, TITOV I. When a good translation is wrong in context: Context-aware machine translation improves on deixis, ellipsis, and lexical cohesion[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL, 2019a: 1198-1212.
[16] VOITA E, SENNRICH R, TITOV I. Context-aware monolingual repair for neural machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP, 2019b: 877-886.
[17] XIONG H, HE Z, WU H, et al. Modeling coherence for discourse neural machine translation[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence, AAAI, 2019: 7338-7345.
[18] XIA Y, TIAN F, WU L, et al. Deliberation networks: Sequence generation beyond one-pass decoding[C]//Proceedings of the Annual Conference on Neural Information Processing Systems, NIPS, 2017: 1784-1794.
[19] LIN Z, FENG M, YU M, et al. A Structured self-attentive sentence embedding[C]//Proceedings of the 5th International Conference on Learning Representations, ICLR, 2017.
[20] KOEHN P, HOANG H, BIRCH A, et al. Moses: Open source toolkit for statistical machine translation[C]//Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, ACL, System Demonstrations, 2007: 177-180.
[21] SENNRICH R, HADDOW B, BIRCH A. Neural machine translation of rare words withsubword units[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL, 2016: 1715-1725.
[22] PHILIPP K. Statistical significance tests for machine translation evaluation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing , EMNLP, 2004: 388-395.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61876120,61976148)
{{custom_fund}}