依存树到串模型使用基于HDR片段的翻译规则。HDR片段是由中心词及其所有依存节点组成的树片段。这种翻译规则可以较好地捕捉语言中的句子模式和短语模式等组合现象,但在捕捉非组合现象(如习惯用语或固定搭配)方面存在不足。这类非组合现象易于由短语捕捉。为了更好地改善依存树到串模型的性能,本文提出了三种引入双语短语的方法,分别为引入句法短语、引入泛化句法短语及引入非句法短语。实验结果表明,同时使用句法短语、泛化句法短语及非句法短语时,可以将依存树到串模型的性能显著提高约1.0 BLEU值。
Abstract
Dependency-to-String model makes use of translation rules based on head-dependents relations, which consists of a head and all its dependents. This model is good at capturing sentence patterns and phrase patterns in the source language, but fails in capturing non-compositional phenomena(such as idiom and collocation)that can be captured easily by phrases. In order to better improve the performance, we propose three ways to incorporate syntactic phrases, generalized syntactic phrases and non-syntactic phrases into this model. Experiments show that this model gains up to about 1.0 BLEU score by incorporating these three kinds of phrases.
关键词
统计机器翻译 /
依存树到串模型 /
泛化句法短语 /
非句法短语
{{custom_keyword}} /
Key words
statistical machine translation /
Dependency-to-String Model /
generalized syntactic bilingual phrases /
non-syntactic bilingual phrases
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Heidi J. Phrasal cohesion and statistical machine translation [C]//Proceedings of EMNLP 2002: 304-311.
[2] Dekang Lin. A path-based transfer model for machine translation [C]//Proceedings of COLING 2004: 625-630.
[3] Chris Quirk, Arul Menezes, Colin Cherry. Dependency treelet translation: Syntactically informed phrasal smt [C]//Proceedings of ACL 2005: 271-279.
[4] Deyi Xiong, Qun Liu, Shouxun Lin. A dependency treelet string correspondence model for statistical machine translation [C]//Proceedings of the second workshop on Statistical Machine Translation. Assocication for Computational Linguistics, 2007: 40-47.
[5] Jun Xie, Haitao Mi, Qun Liu. A novel dependency-to-string model for statistical machine translation[C]//Proceedings of EMNLP 2011: 216-226.
[6] Yang Liu, Qun Liu, Shouxun Lin. Tree-to-string alignment template for statistical machine translation [C]//Proceedings of ACL 2006: 609-616.
[7] David Chiang. Hierarchical phrase-based translation[J].Computational Linguistics, 2007, 33(2):201-228.
[8] Huihsin Tseng, Pichuan Chang, Galen Andrew, et al. A Conditional Random Field Word Segmenter[C]//Proceedings of Fourth SIGHAN Workshop on Chinese Language Processing.
[9] Franz Josef Och, Hermann Ney. A systematic comparison of various statistical alignment models[J]. Computational Linguistics, 2003, 29(1):19-51.
[10] Dan Klein, Christopher D. Manning.Fast exact inference with a factored model for natural language parsing [C]//Proceedings of Advances in Neural Information Processing Systems 15 NIPS, 2003:3-10.
[11] Andreas Stolcke.Srilm—an extensible language modeling toolkit [C]//Proceedings of ICSLP, 2002, 30:901-904.
[12] Franz Josef Och. Minimum error rate training instatistical machine translation [C]//Proceedings of ACL 2003: 160-167.
[13] Kishore Papineni, SalimRoukos, Todd Ward, Wei Jing Zhu. Bleu: a method for automatic evaluation of machine translation [C]//Proceedings of ACL 2002: 311-318.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金重点项目(60736014),国家自然科学基金项目(60873167,90920004),863重点项目(2011AA01A207)
{{custom_fund}}