本文介绍了对基于单元合并的汉字切分算法作出的改进。该改进算法对原算法中的核心部分高级合并部分进行了修改,通过在所有的可合并单元中找最佳合并组合,来避免原来的算法在高级合并过程中可能导致的某些合并错误。经过多个实际样本的测试,所作的改进在不降低原算法各种性能的前提下,消除了原算法在某些情况下产生的错误,进一步有效地提高了切分的正确率。
Abstract
This paper introduces the modification of the Chinese character segmentation method based on units amalgamation. This modified method alters the advanced amalgamation part which is the core of the original method. Because the modified method looks for the best amalgamating combination from all the units which can be amalgamated , it can avoid some amalgamation errors which will be caused by the advanced amalgamation in the original method. By many tests on actual samples , the modification does not decline the performance of the original method , and it removes some errors and effectively improves the segmentation correct rate.
关键词
单元合并 /
切分算法 /
高级合并 /
最佳合并组合
{{custom_keyword}} /
Key words
units amalgamation /
segmentation method /
advanced amalgamation /
the best amalgamating combination
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Richard G. Casey and Eric Lecolinet , A survey of method and strategies in character segmentation , IEEE Trans. PAMI , vol.18 , no.7 , pp. 690 - 706 , Jul. 1996
[2] Y. Lu , On the segmentation of touching character , Int'1 Conf. Document Analysis and Recognition , Tsukuba , Japan , pp. 440 - 443
[3] Hinds S. C. ,Fisher J. and D'Amato ,D , A document skew detection method using run-length encoding and the Hough transform , Proc. of the 10th IAPR , Atlantic City , pp. 464 - 468 ,1990
[4] Sargur N. Srihari and Venugopal Govindaraju , Analysis of textual images using the Hough transform , Machine Vision and Application , vol.2 , pp. 141 - 153 ,1989
[5] Cheng-Chin Chiang , Tze Cheng and Shiaw-Shien Yu , An Iterative Rule-Based Character Segmentation Method for Chinese Documents , International Conference on Chinese Computing'96 The Latest Technological Advancement & Applications , pp. 301 - 307 , June 4 - 7 , 1996 ,Singapore
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}