曹宜超,高翊,李淼,冯韬,王儒敬,付莎. 基于单语语料和词向量对齐的蒙汉神经机器翻译研究[J]. 中文信息学报, 2020, 34(2): 27-32,37.
CAO Yichao, GAO Yi, LI Miao, FENG Tao, WANG Rujing, FU Sha. Mongolian-Chinese Neural Machine Translation Based on Monolingual Corpora and Word Embedding Alignment. , 2020, 34(2): 27-32,37.
Mongolian-Chinese Neural Machine Translation Based on Monolingual Corpora and Word Embedding Alignment
CAO Yichao1,2, GAO Yi3, LI Miao1, FENG Tao1,2, WANG Rujing1, FU Sha3
1.Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei, Anhui 230031, China; 2.University of Science and Technology of China, Hefei, Anhui 230026, China; 3.Yunnan Minority Languages Guidance Committee Office, Kunming, Yunnan 650118, China
Abstract:To improve the Mongolian-Chinese neural machine translation performance, this paper proposes a method based on monolingual corpora and word embedding alignment. First, the Mongolian and Chinese word embedding spaces are aligned to initialize the embedding layers of the model. Second, jointly training is employed to train Mongolian-to-Chinese translation and Chinese -to-Mongolian translation at the same time. Finally, Mongolian and Chinese monolingual corpora are utilized to train the model as a denoising autoencoder. Experimental results show that the proposed method outperforms the baseline approach and improves the performance of Mongolian-Chinese neural machine translation.
[1] Vaswani A,Shazeer N,Parmar N,et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.Curran Associates Inc.,2017:6000-6010. [2] 刘洋.神经机器翻译前沿进展[J].计算机研究与发展,2017,54(6):1144-1149. [3] Irvine A,Callison-Burch C.End-to-end statistical machine translation with zero or small parallel texts[J].Natural Language Engineering,2016,22(4):517-548. [4] Zheng H,Cheng Y,Liu Y.Maximum expected likelihood estimation for zero-resource neural machine translation[C]//Proceedings of IJCAI,2017:4251-4257. [5] 张志全,王连柱.COCA单语语料库在医学汉英翻译中的应用研究[J].海外英语,2016(13):122-123. [6] Devlin J,Chang M W,Lee K,et al.BERT:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1,2019:4171-4186. [7] Radford A,Narasimhan K,Salimans T,et al.Improving language understanding by generative pre-training[J/OL].URL https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/language understanding paper.,2018. [8] Gulcehre C,Firat O,Xu K,et al.On using monolingual corpora in neural machine translation[J].arxiv.preprint xiv 1503.03535,2015. [9] Cheng Y,Xu W,He Z,et al.Semi-supervised learning for neural machine translation[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,2016:1965-1974. [10] Sennrich R,Haddow B,Birch A.Improving neuralmachine translation models with monolingual data[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers),2016:86-96. [11] 李亚超,熊德意,张民.神经机器翻译综述[J].计算机学报,2018,41(12):100-121. [12] Artetxe M,Labaka G,Agirre E.Learning bilingual word embeddings with (almost) no bilingual data[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics,2017:451-462. [13] Zhang M,Liu Y,Luan H,et al.Adversarial training for unsupervised bilingual lexicon induction[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics,2017:1959-1970. [14] Conneau A,Lample G,Ranzato M A,et al.Word translation without parallel data[J].arxiv.preprint avxiv:1710.04087.2017. [15] Vincent P,Larochelle H,Bengio Y,et al.Extracting and composing robust features with denoising autoencoders[C]//Proceedings of the 25th International Conference on Machine Learning.ACM,2008:1096-1103. [16] Mikolov T,Le Q V,Sutskever I.Exploiting similarities among languages for machine translation[J].arxiv.preprint arxiv:1309.4168,2013. [17] 斯·劳格劳,内蒙古蒙科立蒙古文化股份有限公司.蒙古文自动校对(试用版)[CP/OL].[2018-11-05].http://mts.menksoft.com/home/Jindex. [18] Sun M,Chen X,Zhang K,et al.Thulac:An efficient lexical analyzer for Chinese[R].Technical Report,2016. [19] Kingma D P,Ba J.Adam:A method for stochastic optimization[J].arxiv preprint arxiv:1412.6980,2014. [20] Srivastava N,Hinton G,Krizhevsky A,et al.Dropout:A simple way to prevent neural networks from overfitting[J].The Journal of Machine Learning Research,2014,15(1):1929-1958. [21] Hochreiter S,Schmidhuber J.Long short-term memory[J]. Neural Computation,1997,9(8):1735-1780.