由于藏汉平行语料匮乏,导致藏汉神经网络机器翻译效果欠佳,该文提出了一种将藏语单语语言模型融合到藏汉神经网络机器翻译的方法,首先利用神经网络实现藏语单语语言模型,然后使用Transformer实现藏汉神经网络机器翻译模型,最后将藏语单语语言模型融合到藏汉神经网络机器翻译中。实验表明,该方法能显著提升藏汉神经网络机器翻译质量。基线系统藏语到汉语的BLEU值为21.1,汉语到藏语的BLEU值为18.6,融合藏语单语语言模型后,藏语到汉语的BLEU值为24.5,汉语到藏语的BLEU值为23.3,比原有基线系统的BLEU值分别提高了3.4和4.7。
Abstract
To better utilize the monolingual Tibetan texts in Tibetan-Chinese neural machine translation(NMT), we propose to pre-train a Tibetan neural language model and then integrate it into a Transformer-based Tibetan-Chinese NMT model. Experiments indicate our approach can boost the Tibetan-Chinese results from 21.1 to 24.5, and the Chinese-Tibetan form 18.6 to 23.3 in terms of BLEU score.
关键词
藏语 /
语言模型 /
机器翻译 /
融合 /
神经网络
{{custom_keyword}} /
Key words
Tibetan /
language model /
machine translation /
fusion /
neural net
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Zoph B, Deniz Y, Jonathan M, et al. Transfer learning for low-resource neural machine translation[C]// CoRR abs/1604.02201. 2016.
[2] Robert Ostling,Jorg Tiedemann.Neural machine translation for low-resource languages[C]//Proceedings of the EMNLP 2017.
[3] Ebtesam H.Almansor,Ahmed Al-Ani.A hybrid neural machine translation technique for translating low resource languages[C]//Proceedings of the 14th International Conference, MLDM,2018.
[4] Tao Feng,Miao Li,Xiaojun Liu,et al.Improving low-resource neural machine translation with weight sharing[C]//Proceedings of the CCL, 2018.
[5] Ilya Sutskever,Oriol Vinyals,Quoc V Le. Sequence to sequence learning with neural networks[C]//Proceedings of the NIPS, 2014.
[6] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[J]. arXiv:1409.0473. 2014.
[7] Ba J L, Kiros J R, Hinton G E, Layer normalization[J]. arXiv:1607.06450. 2016.
[8] Gu J, Hassan H, Devlin J, et al.Universal neural machine translation for extremely low resource languages[C]//Proceedings of the NAACL-HLT,2018.
[9] Yoshua Bengio. On integrating a language model into neural machine translation[J]. Science Direct, 2016,15(1):137-148.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61063033,61662061);国家重点研发计划(2017YFB1402200)
{{custom_fund}}