Abstract:In recent years, the phrase-based statistical machine translation model has obtained more attention for its good translation performance. However, the model uses the strategy of precise matching in decoding, and the data sparseness becomes a serious problem. On the one hand, some phrases become the “unknown phrases” because they cannot be matched precisely in the phrase table; On the other hand, most of the phrases in the phrase table can’t be used in the translation process. Therefore, we propose a novel translation approach based on phrase fuzzy matching and sentence expansion. In our approach, for a phrase out of the phrase table, i.e. unknown phrase, we find its similar phrase in the phrase table through fuzzy matching. Then the sentence is expanded by replacing the original phrase with the similar ones before being translated into the target language. Finally, a combination of multi-classifier is employed to select the best translation. The experiment results show that this approach significantly improves the translation quality. Key wordsartificial intelligence; machine translation; phrase-based statistical machine translation; fuzzy matching; combination classifier
[1] Philipp Koehn, Franz J. Och, Daniel Marcu. Statistical phrase-based translation[C]//Proc.of NAACL, Edmonton, Canada, 2003: 48-54. [2] 宗成庆, 吴华, 黄泰翼,等. 限定领域汉语口语对话语料分析[C]//全国第五届计算语言学联合学术会议论文集,北京: 清华大学出版社, 1999, 115-122. [3] F. J. Och, H. Ney. The Alignment Template Approach to Statistical Machine Translation[J]. Computational Linguistics, 2004, 30(4): 417-449. [4] 何中军, 刘群, 林守勋. 基于短语相似度的统计机器翻译模型[C]//第三届中国统计机器翻译研讨会论文集. 哈尔滨: 哈尔滨工业大学, 2007: 52-59.
[5] 董振东. 知网[CP/OL]. http://www.keenage.com. [6] 刘群,李素建. 基于《知网》的词汇语义相似度计算[C]//第三届汉语词汇语义学研讨会. 台北: 2002. [7] Damerau F J. A Technique for Computer Detection and Correction of Spelling Errors[J]. Communications of the Association for Computing Machinery, 1964, 7(3): 171-176. [8] V. Vapnik. The Nature of Statistical Learning Theory[M]. Berlin: Springer, 1995. [9] C.-C. Chang, C.-J. Lin.LIBSVM[CP/OL]. http://www.csie.ntu.edu.tw/~cjlin/libsvm. [10] Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining[M]. Addison Wesley: 2005. [11] J. Kittler, M. Hatef, R.P.W. Duin, and J. Matas. On combining classifiers[J]. IEEE Trans. Pattern Analysis and Machine Intelligence, 1998, 20: 226-239. [12] 李寿山. 情感文本分类方法研究[D]. 中科院自动化所图书馆, 2008.