日语中谓词语态有不同的词尾变形,其中被动态和可能态具有相同的词尾变化,在统计机器翻译中难以对其正确区分及翻译。因此,该文提出一种利用最大熵模型有效地对日语可能态和被动态进行分类,然后把日语的可能态和被动态特征有效地融合到对数线性模型中改进翻译模型的方法,以提高可能态和被动态翻译规则选择的准确性。实验结果表明,该方法可以有效提升日语可能态和被动态句子的翻译质量,在大规模日汉语料上,最高翻译BLEU值能够由41.50提高到42.01,并在人工评测中,翻译结果的整体可理解度得到了2.71%的提升。
Abstract
The suffixes of Japanese predicates have complex formation of different voice. Both passive and potential predicates are formed with the same suffix which originated from the same stem, which cause mistranslation in statistical machine translation. In this paper, a new method has been proposed for rule selection among different voice. Maximum entropy models are built to effectively classify passive and potential voice, and then voice features are integrated into the log-linear model translation model. In Japanese to Chinese translation task, large scale experiment shows that our approach improves the translation performance from 41.50 to 42.01 in BLEU score, and the informativness is 2.71% higher according to the human evaluation results.
关键词
被动态 /
可能态 /
统计机器翻译 /
最大熵模型
{{custom_keyword}} /
Key words
passive voice /
active voice /
statistical machine translation /
maximum entropy models
/
/
/
/
/
/
/
/
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Nakamura H. Two Types of Complex Predicate Formation: Japanese Passive and Potential Verbs[C]//Proceedings of the Pacific Asia Conference on Languages, Information, and Computation. 2007: 340-348.
[2] Alam Y S. A Rule-based Morpho-semantic Analyzer of the Japanese Verb Phrases of Simple Sentences[C]//Proceedings of the PACLIC. 2008: 101-112.
[3] 卜,朝暉, 浅井,良信, 王,軼謳, et al. 日中機械翻訳における構文上の対応のずれに関する考察 : 受動態と能動態のずれ、品詞のずれを中心に(翻訳)[J]. 情報処理学会研究報告: 自然言語処理研究会報告, 2006, 2006(124): 33-40.
[4] Xiong D, Liu Q, Lin S. Maximum entropy based phrase reordering model for statistical machine translation[C]//Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006: 521-528.
[5] He Z, Liu Q, Lin S. Improving statistical machine translation using lexicalized rule selection[C]//Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 2008: 321-328
[6] Van Nguyen V, Shimazu A, Le Nguyen M, et al. Improving a lexicalized hierarchical reordering model using maximum entropy[C]//Proceedings of the MT Summit XII, Ottawa, Canada, August, 2009.
[7] Iglesias G, de Gispert A, Banga E R, et al. Rule filtering by pattern for efficient hierarchical translation[C]//Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2009: 380-388.
[8] Shen L, Xu J, Weischedel R M. A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model[C]//Proceedings of the ACL. 2008: 577-585.
[9] DCˇmejrek M, Zhou B, Xiang B. Enriching SCFG rules directly from efficient bilingual chart parsing[C]//Proceeding of the International Workshop on Spoken Language Translation. 2009: 136-143.
[10] Gao Y, Koehn P, Birch A. Soft dependency constraints for reordering in hierarchical phrase-based translation[C]//proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 2011: 857-868.
[11] Chiang D. A hierarchical phrase-based model for statistical machine translation[C]//Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2005: 263-270.
[12] Chiang D. Hierarchical Phrase-Based Translation[J]. Computational Linguistics, 2007, 33(2): 201-228.
[13] Kawahara D, Kurohashi S. Case frame compilation from the web using high-performance computing[C]//Proceedings of the 5th International Conference on Language Resources and Evaluation.2006: 1344-1347.
[14] Murata M, Shirado T, Kanamaru T, et al. Machine-learning-based transformation of passive Japanese sentences into active by separating training data into each input particle[C]//Proceedings of the COLING/ACL on Main conference poster sessions. Association for Computational Linguistics, 2006: 587-594.
[15] Sasano R, Kawahara D, Kurohashi S, et al. Automatic Knowledge Acquisition for Case Alternation between the Passive and Active Voices in Japanese[C]//Proceedings of the EMNLP. 2013: 1213-1223.
[16] Xiao T, Zhu J, Zhang H, et al. NiuTrans: an open source toolkit for phrase-based and syntax-based machine translation[C]//Proceedings of the ACL 2012 System Demonstrations. 2012: 19-24.
[17] Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002: 311-318.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61370130,61473294);中央高校基本科研业务费专项资金资助(2015JBM033);国家国际科技合作专项资助(2014DFA11350)
{{custom_fund}}