使用统计方法可以对汉英机器翻译的词性标注和句法语义分析阶段产生的歧义进行消歧,在估计过程中往往使用最大可能方法,但是并不是在所有的情况下取最大值都是正确的。为了从所有候选结果中取到正确的结果,本文使用健壮性学习算法。使用这个算法,当正确的候选结果评分不是最高时,仍能通过健壮性算法来调整正确结果的评分,使之最大,并且降低不正确候选的评分。而且,由于训练集与测试集存在不同,使训练集中的错误率最小不能保证测试集中的错误率也最小。因此当考虑训练语料库和测试语料库存在统计变化时,应该使用健壮性学习算法。
Abstract
Disambiguities of part-of-speech tagging , syntactic and semantic analysis are disambiguted using statistical method. Maximal likelihood principle is used for disambiguting , but it is not all right under all conditions. Robust learning algorithm is used in this paper in order to acquire the right result among all candidates. When score of the right candidate is not maximal , it can be adjusted using robust learning algorithm , thus score of the right candidate is maximal and score of the wrong candidate is reduced. Moreover , there is difference between training set and test set , wrong rate of training set is minimal but wrong rate of test set is not minimal. When there is statistical difference between training set and test set , robust learning algorithm should be used.
关键词
健壮性学习算法 /
机器翻译 /
评分
{{custom_keyword}} /
Key words
robust learning algorithm /
machine translation /
score
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Kendall Maurice ,Stuart Alan. The Advanced Theory of Statistics. Macmillan.
[2] Amari , Shunichi (1967) . A theory of adaptive pattern classifiers. IEEE Trans. on Electronic Computers EC - 16 ,1979 ,299 - 307
[3] Katagiri Shigeru ,Lee Chin-Hui ,Juang Biing-Hwang. New discriminative training algorithm based on the method. In : proceedings , 1991 IEEE Workshop Neural Networks for Signal Processing1 Piscataway , New Jersey ,1991 ,299 - 308
[4] Su Keh-Yih ,Lin Yi-Chung. Robust Learning , Smoothing , and Parameter Tying on Syntactic Ambiguity Resolution ,Computational Linguistics ,1995
[5] LiuYing ,LiuQun ,ZhangXiang , et al . A Hybrid Approach to Chinese-English Machine Translation , IEEE ICIPS’97 ,1997 ,1146 - 1150
[6] 刘群,詹卫东,常宝宝. 一个汉英机器翻译系统的计算模型与语言模型. 见:吴泉源,钱跃良. 智能计算机接口与应用进展. 北京:电子工业出版社,1997
[7] 刘颍. 规则方法和统计方法相结合在汉英机器翻译中的研究和应用[博士学位论文] . 北京:中国科学院计算技术研究所,1998
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然资金项目(69972025)
{{custom_fund}}