基于“松弛尺度”的短语翻译对抽取方法

何彦青,周玉,宗成庆,王霞

PDF(1017 KB)
PDF(1017 KB)
中文信息学报 ›› 2007, Vol. 21 ›› Issue (5) : 91-95.
综述

基于“松弛尺度”的短语翻译对抽取方法

  • 何彦青1,周玉1,宗成庆1,王霞2
作者信息 +

A Flexible-Scale-Based Method for Phrase Translation Extraction

  • HE Yan-qing1, ZHOU Yu1, ZONG Cheng-qing1, WANG Xia2
Author information +
History +

摘要

短语对抽取是基于短语统计机器翻译方法的关键技术。当前广泛使用的Och提出的短语对抽取方法,过于依赖词对齐结果,因而只能抽取与词对齐完全相容的短语对。本文给出一种基于“松弛尺度”的短语抽取方法,对不能完全相容的短语对,结合词性标注信息和词典信息来判断是否进行抽取,放松“完全相容”的限制,可以保证为更多的源短语找到目标短语。实验表明,该抽取方法的性能比Och的方法有明显的改善和提高。

Abstract

The phrase translation pair extractions is one of the key techniques in the Phrase-based Statistical Machine Translation. Och’s phrase extraction method heavily depends on word alignments, so only the phrase pairs which are fully consistent with the word alignments are extracted. This paper proposes a method of phrase pair extraction with a flexible scale. This method can extract those phrase alignments which Och’s method can not obtained. The flexible scale is based on the two features: POS and dictionary information. Our experiments have shown that our method outperforms Och’s method significantly.

关键词

人工智能 / 机器翻译 / 短语对抽取 / 统计机器翻译 / 松弛尺度

Key words

artificial intelligence / machine translation / phrase pair extraction / statistical machine translation / flexible scale
 
/   /   / *

引用本文

导出引用
何彦青,周玉,宗成庆,王霞. 基于“松弛尺度”的短语翻译对抽取方法. 中文信息学报. 2007, 21(5): 91-95
HE Yan-qing, ZHOU Yu, ZONG Cheng-qing, WANG Xia. A Flexible-Scale-Based Method for Phrase Translation Extraction. Journal of Chinese Information Processing. 2007, 21(5): 91-95

参考文献

[1] Daniel Marcu and William Wong. A Phrase-based, Joint Probability Model for Statistical Machine Translation [A]. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) [C]. Philadelphia, PA, USA. July 2002.
[2] Dekai WU. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora [J]. Computational Linguistics 1997. 23(3):377-404.
[3] Ying Zhang, Stephan Vogel, and Alex Waibel. Integrated phrase segmentation and alignment algorithm for statistical machine translation [A]. In: Proceeding of International Conference on Natural Language Processing and Knowledge Engineering [C]. Beijing: 2003.
[4] Ying Zhang, Stephan Vogel. Competitive Grouping in Integrated Phrase Segmentation and Alignment Model [A]. In: Proceeding of ACL Workshop on Building and Using Parallel Texts [C]. Ann Arbor.2005. 159-162.
[5] H Kaji, Y Kida, and Y Morimoto. Learning Translation Templates from Bilingual Texts [A]. In: Proceedings of the 14th International Conference on Computational Linguistics [C]. Nantes France.1992. 672-678.
[6] Franz Josef Och, Hermann Ney. The alignment template approach to statistical machine translation [J]. Computational Linguistics, 2004, 30(40): 417-449.
[7] David Chiang. A Hierarchical Phrase-Based Model for Statistical Machine Translation [A]. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics [C]. Ann Arbor. 2005.
[8] Philip Koehn. Pharaoh: a beam search decoder for phrase-based SMT [A]. In: Proceedings of the Conference of the Association for Machine Translation in the Americans (AMTA) [C]. Washington. District of Columbia: 2004. 115-124.
[9] Eric Brill. Transformation-based-error-driven learning and natural language processing: A case study in part-of-speech tagging [J]. Computational Linguistics, 1995, 21(40): 543-565.

基金

国家自然科学基金资助项目(60575043,60121302);国家863计划资助项目(2006AA01Z194);诺基亚(中国)研究中心合作项目的资助
PDF(1017 KB)

Accesses

Citation

Detail

段落导航
相关文章

/