基于短语模糊匹配和句子扩展的统计翻译方法

刘鹏,宗成庆

PDF(1064 KB)
PDF(1064 KB)
中文信息学报 ›› 2009, Vol. 23 ›› Issue (5) : 40-47.
综述

基于短语模糊匹配和句子扩展的统计翻译方法

  • 刘鹏,宗成庆
作者信息 +

Approach to Statistical Machine Translation Based on Phrase
Fuzzy-Matching and Sentence Expansion

  • LIU Peng, ZONG Chengqing
Author information +
History +

摘要

近几年来,基于短语的统计翻译模型在机器翻译研究中受到普遍关注,并取得了较好的翻译性能。但是,由于目前基于短语的翻译系统在解码时采用精确匹配的策略,常常导致数据稀疏,一方面,有些短语在训练获得的短语表中找不到精确的匹配,使其成为未知短语;另一方面,短语表中大量的短语无法得到充分的利用。为此,我们提出了基于短语模糊匹配和句子扩展的翻译方法。对于不存在于短语表中的短语,通过模糊匹配的办法,寻找与其相似的短语,然后将所有相似短语用于替换原短语,从而生成扩展句子,在此基础上对所有扩展的句子进行翻译。由于并不是所有扩展后的句子都能提高原始句子的翻译效果,因此,我们在句子翻译完成后设置了组合分类器用于选择最优翻译结果。实验证明,这种方法可以有效地提高翻译系统的译文质量。

Abstract

In recent years, the phrase-based statistical machine translation model has obtained more attention for its good translation performance. However, the model uses the strategy of precise matching in decoding, and the data sparseness becomes a serious problem. On the one hand, some phrases become the “unknown phrases” because they cannot be matched precisely in the phrase table; On the other hand, most of the phrases in the phrase table can’t be used in the translation process. Therefore, we propose a novel translation approach based on phrase fuzzy matching and sentence expansion. In our approach, for a phrase out of the phrase table, i.e. unknown phrase, we find its similar phrase in the phrase table through fuzzy matching. Then the sentence is expanded by replacing the original phrase with the similar ones before being translated into the target language. Finally, a combination of multi-classifier is employed to select the best translation. The experiment results show that this approach significantly improves the translation quality.
Key wordsartificial intelligence; machine translation; phrase-based statistical machine translation; fuzzy matching; combination classifier

关键词

人工智能 / 机器翻译 / 基于短语的统计机器翻译 / 模糊匹配 / 组合分类器

Key words

artificial intelligence / machine translation / phrase-based statistical machine translation / fuzzy matching / combination classifier

引用本文

导出引用
刘鹏,宗成庆. 基于短语模糊匹配和句子扩展的统计翻译方法. 中文信息学报. 2009, 23(5): 40-47
LIU Peng, ZONG Chengqing. Approach to Statistical Machine Translation Based on Phrase
Fuzzy-Matching and Sentence Expansion. Journal of Chinese Information Processing. 2009, 23(5): 40-47

参考文献

[1] Philipp Koehn, Franz J. Och, Daniel Marcu. Statistical phrase-based translation[C]//Proc.of NAACL, Edmonton, Canada, 2003: 48-54.
[2] 宗成庆, 吴华, 黄泰翼,等. 限定领域汉语口语对话语料分析[C]//全国第五届计算语言学联合学术会议论文集,北京: 清华大学出版社, 1999, 115-122.
[3] F. J. Och, H. Ney. The Alignment Template Approach to Statistical Machine Translation[J]. Computational Linguistics, 2004, 30(4): 417-449.
[4] 何中军, 刘群, 林守勋. 基于短语相似度的统计机器翻译模型[C]//第三届中国统计机器翻译研讨会论文集. 哈尔滨: 哈尔滨工业大学, 2007: 52-59.


[5] 董振东. 知网[CP/OL]. http://www.keenage.com.
[6] 刘群,李素建. 基于《知网》的词汇语义相似度计算[C]//第三届汉语词汇语义学研讨会. 台北: 2002.
[7] Damerau F J. A Technique for Computer Detection and Correction of Spelling Errors[J]. Communications of the Association for Computing Machinery, 1964, 7(3): 171-176.
[8] V. Vapnik. The Nature of Statistical Learning Theory[M]. Berlin: Springer, 1995.
[9] C.-C. Chang, C.-J. Lin.LIBSVM[CP/OL]. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[10] Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining[M]. Addison Wesley: 2005.
[11] J. Kittler, M. Hatef, R.P.W. Duin, and J. Matas. On combining classifiers[J]. IEEE Trans. Pattern Analysis and Machine Intelligence, 1998, 20: 226-239.
[12] 李寿山. 情感文本分类方法研究[D]. 中科院自动化所图书馆, 2008.


基金

国家自然科学基金资助项目(60575043,60736014);国家863计划资助项目(2006AA01Z194,2006AA010108)
PDF(1064 KB)

Accesses

Citation

Detail

段落导航
相关文章

/