1. School of Information Science and Technology,University of Science and Technology of China,Hefei,Anhui 230026,China; 2. Microsoft Research Asia,Beijing 100080,China
Abstract:Long distance reordering is a major challenge in statistical machine translation. Previous work has shown that pre-reordering is a promising way to tackle this problem. In this work,we extend this line of research and propose a neural network based pre-reorder model,which integrates neural network modeling into a linear ordering framework. The neural network based model can leverage syntactic and semantic information extracted from unlabeled data to predict the word order difference between languages. Experiments on Chinese-English,and Japanese-English machine translation tasks show the effectiveness of our approach.
[1]Philipp-Koehn,Franz Josef Och,Daniel Marcu. Statistical phrase-based translation[C]//Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology.2003,1: 48-54. [2]Deyi-Xiong,Qun Liu,Shouxun Lin. Maximum entropy based phrase reordering model for statistical machine translation[C]//Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics.2006: 521-528. [3]David-Chiang. A hierarchical phrase-based model for statistical machine translation[C]//Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics.2005: 263-270. [4]Yang-Liu,Qun Liu,Shouxun Lin. Tree-to-string alignment template for statistical machine translation[C]// Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics.2006: 609-616. [5]Roy-Tromble,Jason Eisner. Learning linear ordering problems for better translation[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing.2009,1: 1007-1016. [6]Yoshua-Bengio,Holger Schwenk,Jean-Sébastien Senécal,et al. Neural probabilistic language models[J]. Innovations in Machine Learning,2006,194: 137-186. [7]冯洋,张冬冬,刘群. 层次短语翻译模型的介词短语调[J]. 中文信息学报,2012,26(1): 31-37. [8]肖欣延,刘洋,刘群,林守勋. 面向层次短语翻译的词汇化调序方法研究[J]. 中文信息学报,2012,26(1): 37-41. [9]Michael-Collins,Philipp Koehn,Ivona Kucerova. Clause restructuring for statistical machine translation[C]//Proceedings of the 43rd annual meeting on association for computational linguistics.2005: 531-540. [10][ZK(#]Peng-Xu,Jaeho Kang,Michael Ringgaard,et al. Using-a dependency parser to improve SMT for subject-object-verb languages[C]//Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the association for computational linguistics.2009: 245-253. [11]Nan-Yang,Mu Li,Dongdong Zhang et al. A ranking-based approach to word reordering for statistical machine translation[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics.2013,1: 912-920. [12]Karthik-Visweswariah,Rajakrishnan Rajkumar,Ankur Gandhe,et al. A word reordering model for improved machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.2011: 486-496. [13]Peng-Li,Yang Liu,Maosong Sun. Recursive Autoencoders for ITG-based Translation[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.2013: 567-577. [14]Ronan-Collobert,Jason Weston,Léon Bottou,et al. Natural language processing (almost) from scratch[J]. The Journal of Machine Learning Research 2011,12(1): 2493-2537. [15]Tomas-Mikolov,Ilya Sutskever,Kai Chen,et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of Advances in Neural Information Processing Systems,2013: 3111-3119. [16]Dekai-Wu. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora[J]. Computational linguistics,1997: 377-403. [17]Franz-Josef Och,Hermann Ney. GIZA++: Training of statistical translation models[Z]. 2000. [18]Kishore-Papineni,Salim Roukos,Todd Ward,et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th annual meeting on association for computational linguistics.2002: 311-318. [19]Slava-Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer[J]. IEEE Transactions on Acoustics,Speech and Signal Processing,1987,35(3): 400-401.