本文依据待校对文本中的常见错误类型介绍了纠错知识库的构造方法以及基于该纠错知识库的自动纠错算法。该算法通过利用出错字串的特征,结合上下文启发信息,可有效地对文本中的别字、漏字、多字、易位、多字替换等错误提供纠错建议。文中还对纠错建议的排序算法进行了讨论。
Abstract
According to common error types in pre-proof reading text , this paper introduce the method to structure correcting knowledge sets and a automatic correcting algorithm based on this correcting knowledge sets. The algorithm makes a full use of the characteristics of wrong strings and context heuristic information. It can provide correcting suggestions for such errors as ghost word ,missed Chinese characters ,superfluous Chinese characters ,reversed Chinese characters and substituted Chinese characters etc. The method of sorting the correcting suggestions is also discussed.
关键词
纠错知识库 /
纠错建议 /
纠错算法 /
似然匹配
{{custom_keyword}} /
Key words
correcting knowledge sets /
correcting suggestion /
correcting algorithm /
likelihood match
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 张仰森,丁冰青. 中文文本自动校对的技术现状及展望. 中文信息学报,1998 , (3)
[2] 孙才,罗振声. 汉语文本校对字词级查错处理的研究. 见:第四届计算语言学会议论文集,北京:清华大学出版社,1997
[3] 郭志立等. 中文校对系统中的修改建议提供算法. 见:第四届计算语言学会议论文集,北京:清华大学出版社,1997
[4] 于勐,姚天顺. 一种混合的中文文本校对方法,中文信息学报,1998 , (2)
[5] 张仰森,丁冰青. 一种英文单词拼写查错和纠错的方法——骨架键法. 电脑开发与应用. 1999 , (2)
[6] Joseph J Pollock. Automatic spelling correction in scientific and scholarly text . Communication of the ACM ,1984 , (4)
[7] James L Peterson. computer programs for detecting and correcting spelling errors. Communication of the ACM ,1980 , (12)
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
山西省自然科学基金(981031)
{{custom_fund}}