中文信息学报 ›› 2023, Vol. 37 ›› Issue (8) : 43-51.


  • 桑杰端珠1,2,才让加1,2
作者信息 +

Dictionary Injection for Tibetan-Chinese Machine Translation Pretraining

  • SANGJIE Duanzhu1,2, CARING Jia1,2
Author information +
History +


近年来,预训练方法在自然语言处理领域引起了广泛关注,但是在比如藏汉机器翻译等低资源的任务设定下,由于双语监督信息无法直接参与预训练,限制了预训练模型在此类任务上的性能改进。考虑到双语词典是丰富且廉价的先验翻译知识来源,同时受到跨语言交流中人们往往会使用混合语言增加沟通效率这一现象启发,该文提出一种基于词典注入的藏汉机器翻译模型的预训练方法,为预训练提供学习双语知识关联的广泛可能。经验证,该方法在藏汉和汉藏翻译方向测试集上的 BLEU 值比 BART 强基准分别高出 2.3 和 2.1,证实了该文所提出的方法在藏汉机器翻译任务上的有效性。


In recent years, pretrained models have attracted extensive attention. To improve its effectiveness in low-resource settings such as Tibetan-Chinese machine translation, this paper proposes a technique to pretrain the Tibetan-Chinese machine translation model via dictionary injection. This approach is motivated by the fact that the bilingual dictionary is an easy resource of prior translation knowledge and popular solution for cross-lingual conversations. Empirical results show the proposed method achieves 2.3 and 2.1 improvements in BLEU scores over strong BART baselines.


藏汉 / 机器翻译 / 预训练 / 词典注入

Key words

Tibetan-Chinese / machine translation / pretraining / dictionary injection


桑杰端珠,才让加. 基于词典注入的藏汉机器翻译模型预训练方法. 中文信息学报. 2023, 37(8): 43-51
SANGJIE Duanzhu, CARING Jia. Dictionary Injection for Tibetan-Chinese Machine Translation Pretraining. Journal of Chinese Information Processing. 2023, 37(8): 43-51


