基于序列相交的短语译文获取

王辰,宋国龙,吴宏林,张俐,刘绍明

PDF(437 KB)
PDF(437 KB)
中文信息学报 ›› 2009, Vol. 23 ›› Issue (1) : 38.
综述

基于序列相交的短语译文获取

  • 王辰1,宋国龙1,吴宏林1,张俐1,刘绍明2
作者信息 +

Sequence Intersection Based Phrase Translation Extraction from Bilingual Corpus

  • WANG Chen1, SONG Guo-long1, WU Hong-lin1, ZHANG Li1, LIU Shao-ming2
Author information +
History +

摘要

短语译文获取技术是基于实例的机器翻译(EBMT)中的核心技术之一,其准确率直接影响到EBMT系统的性能。该文提出了一种基于序列相交的短语译文获取方法,该方法将句子视为词的序列,利用对中日句对齐语料库中包含待译短语的所有源语句子对应的目标语句子进行序列相交的方式,在不需要词对齐、句法分析及词典等资源的情况下,通过充分挖掘句对齐双语语料库的信息,获得高质量的短语译文。实验表明,该方法获得的短语译文准确率超过80%。

Abstract

Phrase translation extraction is one of the key techniques in the Example-Based Machine Translation (EBMT),and its accuracy has a direct influence on the EBMT system performance. This paper proposes a phrase translation extraction method based on sequence intersection in which the sentence is taken as word sequence. Among Chinese-Japanese sentence aligned bilingual corpus, the source sentences containing the phrase are first searched out. Then the pairwise intersections of all these target sentences are acquired as the phrase translaiton. This approach can achieve high-quality phrase translations by mining the bilingual corpus, avoiding pre-possing steps like word alignment, parsing and dictionary. The experiments show our method achieves over 80% accuracy for the acquired phrase translation.

关键词

计算机应用 / 中文信息处理 / EBMT / 短语译文获取 / 序列相交

Key words

computer application / Chinese information processing / EBMT / phrase translation extraction / sequence intersection

引用本文

导出引用
王辰,宋国龙,吴宏林,张俐,刘绍明. 基于序列相交的短语译文获取. 中文信息学报. 2009, 23(1): 38
WANG Chen, SONG Guo-long, WU Hong-lin, ZHANG Li, LIU Shao-ming. Sequence Intersection Based Phrase Translation Extraction from Bilingual Corpus. Journal of Chinese Information Processing. 2009, 23(1): 38

参考文献

[1] Daniel Marcu, William Wong. A Phrase-based, Joint Probability Model for Statistical Machine Translation [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Philadelphia, PA, USA. July 2002.
[2] Dekai WU. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora [J]. Computational Linguistics, 1997, 23(3): 377-404.
[3] Ying Zhang, Stephan Vogel, Alex Waibel. Integrated phrase segmentation and alignment algorithm for statistical machine translation [C]//Proceedingof International Conference on Natural Language Processing and Knowledge Engineering. Beijing, 2003.
[4] Ying Zhang, Stephan Vogel. Competitive Grouping in Integrated Phrase Segmentation and Alignment Model [C]//Proceeding of ACL Workshop on Building and Using Parallel Texts. Ann Arbor. 2005: 159-162.
[5] H Kaji, Y Kida, Y Morimoto. Learning Translation Templates from Bilingual Texts [C]//Proceedings of the 14th International Conference on Computational Linguistics. Nantes France. 1992: 672-678.
[6] Fram Josef Och, Hermann Ney. The alignment template approach to statistical machine translation [J]. Computational Linguistics, 2004, 30(40): 417-449.
[7] 何彦青,周玉,宗成庆,王霞.基于“松弛尺度”的短语翻译对抽取方法[J].中文信息学报,2007,21(5): 91-95.
[8] 刘冬明,赵军,杨尔弘.汉英双语语料库中名词短语的自动对应[J].中文信息学报,2003,17(5): 6-12.
[9] 屈刚,陈笑蓉,陆汝占.基于有效句型的英汉双语短语对齐[J].计算机研究与发展,2003,40(2): 143-149.
[10] 吴宏林. 面向机器翻译的汉日文本对齐研究[D].沈阳: 东北大学,2008.




PDF(437 KB)

Accesses

Citation

Detail

段落导航
相关文章

/