基于短语的统计翻译模型是目前机器翻译领域广泛使用的模型之一。但是,由于在解码时采用短语精确匹配的策略,造成了严重的数据稀疏问题,短语表中的大量短语无法得到充分利用。为此,该文提出了人机互助的交互式翻译方法。对于翻译短语表中找不到的短语,首先通过模糊匹配的方法,在短语表中寻找与其相似的短语。然后利用组合分类器,判断哪些相似短语可能提高句子的翻译质量。最后,通过人机交互的方法,选择可能提高翻译质量且保持原句语义的短语。在口语语料上的实验结果证明,这种方法可以有效地提高翻译系统的译文质量。
Abstract
The phrase-based statistical machine translation model is widely studied and applied in the circle of machine translation research. However, the model uses the strategy of precise matching in decoding, which suffers severely from the data sparseness problem, leaving most phrases in phrase table under-exploited in translation process. Therefore we propose a novel interactive approach to translation based on human-machine cooperation. For an unknown phrase, the system finds its similar phrases in the phrase table through fuzzy matching. Then a classifier is combined to judge phrases capable of improving the translation quality. At last, the phrase which has the same meaning with the unknown phrase is decoded through human-machine interaction. The experimental results on spoken language corpus show that this approach significantly improves the translation quality.
Key words artificial intelligence; machine translation; spoken language translation; phrase-based statistical machine translation; human machine interaction; fuzzy matching
关键词
人工智能 /
机器翻译 /
口语翻译 /
基于短语的统计机器翻译 /
人机交互 /
模糊匹配
{{custom_keyword}} /
Key words
artificial intelligence /
machine translation /
spoken language translation /
phrase-based statistical machine translation /
human machine interaction /
fuzzy matching
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Philipp Koehn, Franz J. Och, Daniel Marcu. Statistical phrase-based translation[C]//Proc. of NAACL. Edmonton, Canada: 2003, 48-54.
[2] 宗成庆, 吴华, 黄泰翼,等. 限定领域汉语口语对话语料分析[C]//全国第五届计算语言学联合学术会议论文集. 北京: 清华大学出版社, 1999, 115-122.
[3] A. Waibel. Interactive Translation of Conversational Speech[C]//Proc. of (C-Star II) ATR International Workshop on Speech Translation. Japan: 1996.
[4] 黄河燕,陈肇雄,宋继平. 一种人机互动的多策略机器翻译系统IHSMTS的设计及实现原理[J]. 中文信息学报, 1999, 13(5):43-50.
[5] C. Zong, M. Seligman. Toward Practical Spoken Language Translation[J]. Machine Translation, 2005, 19(2): 113-137.
[6] 董振东. 知网[CP/OL]. http://www.keenage.com.
[7] 刘群,李素建. 基于《知网》的词汇语义相似度计算[C]//第三届汉语词汇语义学研讨会. 台北: 2002.
[8] Damerau F J. A Technique for Computer Detection and Correction of Spelling Errors[J]. Communications of the Association for Computing Machinery, 1964, 7(3): 171-176.
[9] C.-C. Chang, C.-J. Lin.LIBSVM[CP/OL]. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[10] Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining[M]. Addison Wesley: 2005.
[11] J. Kittler, M. Hatef, R.P.W. Duin et al. On combining classifiers[J]. IEEE Trans. Pattern Analysis and Machine Intelligence, 1998, 20: 226-239.
[12] D. Goddeau, H. Meng, J. Polifroni et al. A Form-Based Dialogue Manager for Spoken Language Applications[C]//The 4th ICSLP. Philadelphia, USA:1996.
[13] E. Levin, R. Pieraccini, W. Eckert. Learning dialogue strategies within the markov decision process framework[C]//Proc. ASRU. 1997,72-79.
[14] Peter Linz. An Introduction to Formal Languages and Automata[M]. MA: Jones and Bartlett Publishers, 2001.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60575043,60736014); 支撑计划资助项目(2006BAH03B02);国家863计划资助项目(2006AA01Z194)
{{custom_fund}}