Abstract:The suffixes of Japanese predicates have complex formation of different voice. Both passive and potential predicates are formed with the same suffix which originated from the same stem, which cause mistranslation in statistical machine translation. In this paper, a new method has been proposed for rule selection among different voice. Maximum entropy models are built to effectively classify passive and potential voice, and then voice features are integrated into the log-linear model translation model. In Japanese to Chinese translation task, large scale experiment shows that our approach improves the translation performance from 41.50 to 42.01 in BLEU score, and the informativness is 2.71% higher according to the human evaluation results.
[1] Nakamura H. Two Types of Complex Predicate Formation: Japanese Passive and Potential Verbs[C]//Proceedings of the Pacific Asia Conference on Languages, Information, and Computation. 2007: 340-348.
[2] Alam Y S. A Rule-based Morpho-semantic Analyzer of the Japanese Verb Phrases of Simple Sentences[C]//Proceedings of the PACLIC. 2008: 101-112.
[3] 卜,朝暉, 浅井,良信, 王,軼謳, et al. 日中機械翻訳における構文上の対応のずれに関する考察 : 受動態と能動態のずれ、品詞のずれを中心に(翻訳)[J]. 情報処理学会研究報告: 自然言語処理研究会報告, 2006, 2006(124): 33-40.
[4] Xiong D, Liu Q, Lin S. Maximum entropy based phrase reordering model for statistical machine translation[C]//Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006: 521-528.
[5] He Z, Liu Q, Lin S. Improving statistical machine translation using lexicalized rule selection[C]//Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 2008: 321-328
[6] Van Nguyen V, Shimazu A, Le Nguyen M, et al. Improving a lexicalized hierarchical reordering model using maximum entropy[C]//Proceedings of the MT Summit XII, Ottawa, Canada, August, 2009.
[7] Iglesias G, de Gispert A, Banga E R, et al. Rule filtering by pattern for efficient hierarchical translation[C]//Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2009: 380-388.
[8] Shen L, Xu J, Weischedel R M. A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model[C]//Proceedings of the ACL. 2008: 577-585.
[9] DCˇmejrek M, Zhou B, Xiang B. Enriching SCFG rules directly from efficient bilingual chart parsing[C]//Proceeding of the International Workshop on Spoken Language Translation. 2009: 136-143.
[10] Gao Y, Koehn P, Birch A. Soft dependency constraints for reordering in hierarchical phrase-based translation[C]//proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 2011: 857-868.
[11] Chiang D. A hierarchical phrase-based model for statistical machine translation[C]//Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2005: 263-270.
[12] Chiang D. Hierarchical Phrase-Based Translation[J]. Computational Linguistics, 2007, 33(2): 201-228.
[13] Kawahara D, Kurohashi S. Case frame compilation from the web using high-performance computing[C]//Proceedings of the 5th International Conference on Language Resources and Evaluation.2006: 1344-1347.
[14] Murata M, Shirado T, Kanamaru T, et al. Machine-learning-based transformation of passive Japanese sentences into active by separating training data into each input particle[C]//Proceedings of the COLING/ACL on Main conference poster sessions. Association for Computational Linguistics, 2006: 587-594.
[15] Sasano R, Kawahara D, Kurohashi S, et al. Automatic Knowledge Acquisition for Case Alternation between the Passive and Active Voices in Japanese[C]//Proceedings of the EMNLP. 2013: 1213-1223.
[16] Xiao T, Zhu J, Zhang H, et al. NiuTrans: an open source toolkit for phrase-based and syntax-based machine translation[C]//Proceedings of the ACL 2012 System Demonstrations. 2012: 19-24.
[17] Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002: 311-318.