该文提出了一种融合格框架的日汉基于语块的依存树到串统计机器翻译模型。其基本思想是从日语依存分析树获取格框架,在翻译模型的规则抽取及解码中,以日语格框架作为约束条件,指导依存树的句法结构重排,调整日语和汉语的句法结构差异,实现格框架与日汉依存树到串模型的融合。实验结果表明,该文提出的方法可有效改善日汉统计机器翻译的句法结构调序和词汇翻译,同时,还可有效提高日汉统计机器翻译的译文质量。
Abstract
This paper proposes a method to integrate case frame into Japanese to Chinese chunk-based dependency-to-string model. Firstly, case frames are acquired from Japanese chunk-based dependency analysis results. Secondly, case frames are used to constrain the rule extraction and the decoding in chunk-based dependency-to-string model. Experimental results show that the proposed method performs well on long structural reordering and lexical translation, and achieves better performance than hierarchical phrase-based model and word-based dependency-to-string model on Japanese to Chinese test sets.
关键词
日汉机器翻译 /
格框架 /
依存树到串模型 /
句法结构
{{custom_keyword}} /
Key words
Japanese to Chinese SMT /
case frame /
dependency-to-string model /
syntax structure
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Yamada K, Knight K. A syntax-based statistical translation model[C]//Proceedings of the 39th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2001: 523-530.
[2] Liu Y, Liu Q, Lin S. Tree-to-string alignment template for statistical machine translation[C]//Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006: 609-616.
[3] Liu Y, Huang Y, Liu Q, et al. Forest-to-string statistical translation rules[C]//Proceedings of ANNUAL MEETING-ASSOCIATION FOR COMPUTATIONAL LINGUISTICS. 2007, 45(1): 704.
[4] Mi H, Huang L, Liu Q. Forest-Based Translation[C]//Proceedings of ACL. 2008: 192-199.
[5] Xie J, Mi H, Liu Q. A novel dependency-to-string model for statistical machine translation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011: 216-226.
[6] Watanabe T, Sumita E, Okuno H G. Chunk-based statistical translation[C]//Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Association for Computational Linguistics, 2003: 303-310.
[7] Hoshino S, Miyao Y, Sudoh K, et al. Two-Stage Pre-ordering for Japanese-to-English Statistical Machine Translation[C]//Proceedings of the 6th International Joint Conference on Natural Language Processing. 2013.
[8] Wu X, Sudoh K, Duh K, et al. Extracting Pre-ordering Rules from Predicate-Argument Structures[C]//Proceedings of IJCNLP. 2011: 29-37.
[9] Wu X, Sudoh K, Duh K, et al. Extracting preordering rules from chunk-based dependency trees for Japanese-to-English translation[C]. Proceedings of the 13th Machine Translation Summit, 2011: 300-307.
[10] Wu D, Fung P. Semantic roles for smt: a hybrid two-pass model[C]//Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers. Association for Computational Linguistics, 2009: 13-16.
[11] HajiAcˇ J, Ciaramita M, Johansson R, et al. The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages[C]//Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task. Association for Computational Linguistics, 2009: 1-18.
[12] Kawahara D, Kurohashi S. Case frame compilation from the web using high-performance computing[C]//Proceedings of the 5th International Conference on Language Resources and Evaluation. 2006: 1344-1347.[13] Kawahara D, Kurohashi S. A fully-lexicalized probabilistic model for Japanese syntactic and case structure analysis[C]//Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. Association for Computational Linguistics, 2006: 176-183.
[14] Sasano R, Kurohashi S. A Discriminative Approach to Japanese Zero Anaphora Resolution with Large-scale Lexicalized Case Frames[C]//Proceedings of the IJCNLP. 2011: 758-766.
[15] Koehn P, Och F J, Marcu D. Statistical phrase-based translation[C]//Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics, 2003: 48-54.
[16] Och F J. Minimum error rate training in statistical machine translation[C]//Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Association for Computational Linguistics, 2003: 160-167.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61370130),国家国际科技合作专项资助(No. 2014DFA11350),北京交通大学人才基金(2011RC034)。
{{custom_fund}}