在手写体中文信封处理系统中,地址行字符切分是实现地址行识别的关键步骤。本文根据邮政信封地址行字符的特点,有针对性的提出了一种字符切分算法。首先对地址行图像利用投影、求连通区域、笔划穿越数分析等基于字符结构的方法进行初始切分,得到基本字段序列;然后通过对相邻的基本字段进行组合形成多条候选切分路径,再通过识别的可信度和邮政目标地址库的先验知识信息对路径进行评价分析,从而得到最优的切分路径。该算法经过邮政分拣机采集的实际信封图像测试,纯地址行识别正确率达到78.61%,地址行识别与邮政编码识别相结合的分拣正确率达到95.42%。
Abstract
Character segmentation for mail address has become a crucial step for the address recognition in the automatic post mail sorting system. In this paper , a character segmentation algorithm was proposed according to the characteristics of handwritten mail address character. First a simple segmentation process was fulfilled using the structure-based methods , including vertical projection , connected components extraction and stroke cross number analysis , to extract the block sequence from the mail address image. Next candidate segmentation paths were created by merging the neighboring blocks. Then these paths were evaluated by the character recognition confidence and knowledge analysis of the known post address database. An experiment with the algorithm was carried out on more than 500 real envelop images ,with the correct sorting rate of address recognition up to 78.61% and the rate of address and postcode integrated recognition up to 95.42%.
关键词
人工智能 /
模式识别 /
邮政信封地址 /
脱机手写体汉字 /
字符切分 /
OCR
{{custom_keyword}} /
Key words
artificial intelligence /
pattern recognition /
post mail address /
offline handwritten Chinese character /
character segmentation /
OCR
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Y. Lu. Machine printed character segmentation - an overview [J] . Pattern Recognition , 1995 ,28 (1) :67 - 80.
[2] R. Casey , E. Lecolinet. A Survey of Methods and Strategies in Character Segmentation [J] . IEEE Trans. on Pattern Recognition and Machine Intelligence , 1996 , Vol. 18 , No. 7 , pp. 690 - 706.
[3] T. Yamaguchi , S. Tsuruoka , T. Yoshikawa , et al. A segmentation System for Touching Handwritten Japanese Characters [A] . In : Proceedings of IWFHR’02 [C] .Ontario , Canada IEEE Computer Society , 2002 ,407 - 412.
[4] T. Yamaguchi ,T. Yoshikawa ,T. Shinogi ,etal. A Segmentation Method for Touching Japanese Handwritten Characters Based on Connecting Condition of Lines [A] . In : Proceedings of ICDAR’01 [C] . Seattle , WA ,USA: IEEE Computer Society , 2001 , 837 - 841.
[5] Y. Kobayashi , K. Yamada , J. Tsukumo. A Segmentation Method for Handwritten Japanese Lines Based on Transitional Information [A] . In : Proceedings of 11th International Conference on Pattern Recognition (ICPR’1992) [C] . Ten Haag , The Netherlands , 1992 , 487 - 491.
[6] S. Ariyoshi. A Character Segmentation method for Japanese Printed Documents Coping with Touching Character Problems [A] . In : Proceedings of 11th International Conference on Pattern Recognition (ICPR’1992) [C] . Ten Haag , The Netherlands , 1992 , Vol. 2 , 313 - 316.
[7] M. Koga , T. Kagehiro , H. Sako , et al. Segmentation of Japanese Handwritten Characters Using Peripheral Feature Analysis [A] . In : Proceeding of 14th International Conference on Pattern Recognition (ICPR’1998) [C] . Brisbane , Australia , 1998 ,Vol. 2 , 1137 - 1141.
[8] C. L. Liu ,M. Koga ,H. Fujisawa. Lexicon-Driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading [J] IEEE Trans. on Pattern Recognition and Machine Intelligence , 2002 , 24 (11) : 1425 - 1437.
[9] R. G. Casey , G. Nagy. Recursive Segmentation and Classification of Composite Patterns [A] . In : Proceedings of the 6th International Conference on Pattern Recognition (ICPR’1992) [C] . Munich , Germany , 1982 , 1023 - 1026.
[10] Y. H. Tseng , H. J. Lee. Recognition-based Handwritten Chinese Character Segmentation Using a Probabilistic Viterbi Algorithm [J] . Pattern Recognition Letters , 1999 ,20 (8) :791 - 806.
[11] 周嫔,马少平,姜哲. 基于汉字单元合并的切分算法改进[J] . 中文信息学报,1999 ,13 (2) , 33 - 39.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家863计划资助项目(2001AA114130)
{{custom_fund}}