双语库是翻译记忆系统最重要的组成部分之一。从有限规模的双语库中提取更多的符合用户当前翻译需要的关联实例是翻译记忆技术研究的主要内容,本文首先对当前基于单一方法的实例检索算法存在的局限性进行了分析,并在对双语库进行知识化表示的基础上,提出了基于多策略的关联实例提取机制,即综合运用句子句法结构匹配、句子编辑距离计算、句子短语片段匹配、词汇语义泛化、基于扩展信息(如: 句子来源、所属专业、应用频度等信息)的优选等策略进行关联实例提取。试验结果表明,该方法有效提高了关联实例的召回数量和质量,明显改善了对用户的辅助效果。
Abstract
Bilingual corpus is one of the most important parts in translation memory system. To extract more association examples which meet the present needs of users from limited scale of bilingual corpus is the main content of the research of translation memory technology. First of all, this paper analyzes the limits of the current example search method. Based on the knowledge representation of the bilingual corpus, this paper proposes multi-strategy based association example extraction mechanism, that is, to extract association example by using comprehensively the methods of tree matching, sentence edit-distance calculating, phrase chunk matching, lexicon semantic generalization, extended information based optimization (for instance, the information on sentence source, major belonged to, application frequency, etc.). Experimental results indicate that the method effectively improved the recall quantity and quality of association example and the assistant effect to users.
关键词
人工智能 /
机器翻译 /
双语知识库 /
关联实例 /
多策略提取机制 /
翻译记忆
{{custom_keyword}} /
Key words
artificial intelligence /
machine translation /
bilingual knowledge corpus /
association example /
multi-strategy extraction mechanism /
translation memory
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 常宝宝, 詹卫东, 柏晓静.服务于汉英机器翻译的双语对齐语料库和短语库建设[A].第二届中日自然语言处理专家研讨会论文集(北京)[C]. 2002.10, 147-154.
[2] Eiji Aramaki and Sadao Kurohashi. Example-Based Machine Translation Using Structural Translation Examples[A]. International Workshop on Spoken Language Translation (IWSLT)[C]. 2004. 91-94.
[3] 车万翔, 刘挺, 秦兵, 李生.基于改进编辑距离的中文相似句子检索[J].高技术通讯,2004, 07
[4] An SDL White Paper. Knowledge-based Translation http://www.sdl.com [OL].
[5] TMX Specification http://www.lisa.org/standards/tmx/tmx.html [OL].
[6] OpenNLP.tools http://opennlp.sourceforge.net [OL].
[7] The Lemur Project http://www.lemurproject.org [OL].
[8] Dong Zhendong,Dong Qiang. HowNet and the Computation of Meaning[M]. World Scientific Publishing Co.Pte.Ltd. 2006.
[9] E. S. Ristad and P. N. Yianilos. Learning string-edit distance[J]. IEEE PAMI, 1998, 20(5):522-532.
[10] LIU Zhanyi, WANG Haifeng, WU Hua.Example-based Machine Translation Based on TSC and Statistical Generation[A]. MTSummit X ,Phuket, Thailand[C]. 11 - 17 September 2005, 25-32.
[11] Macklovitch, Elliott and Graham Russell. What’s Been Forgotten in Translation Memory [A]. In: Proceedings of AMTA 2000[C]. Cuernavaca, Mexico.
[12] 董振东, 董强. 知网http://www.keenae.com [OL].
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60573185,60520130299)
{{custom_fund}}