本文面向手写字符序列输入信号连续识别研究,分析了汉字及联机手写文本的特点,提出并构建了手写汉字部件集。基于该部件集,完成了GB2312-80的6,763个汉字的部件拆分编码和部件集的测试。统计编码数据发现,汉字依手写部件数的分布规律呈对数正态分布。本文从统计学和字符识别技术的角度对手写部件的构字能力作了分析和讨论,部件集的设计方案在部件选择和汉字拆分上均满足设计要求。实验表明,基于手写部件构造的部件识别器对手写汉字和连续汉字的部件识别率分别达到70.21%和58.49%。
Abstract
The paper introduces a handwritten Chinese character radical set which is established oriented for the research on continuous handwritten character sequence recognition. Based on the set, the task of radical-based splitting and coding for 6,763 Chinese characters was done. From the statistical data we can found that the distribution of Chinese character numbers with regard to radical numbers fits the logarithmic normal distribution model. Futhermore the composing power of handwritten radicals are analyzed and discussed in the view of statistics and character recognition technique. Finally, two radical recognizers were built for test. 70.21% radical recognition rate was obtained in single Chinese character test, while the rate in continuous handwritten character sequence test is 58.49%. The results show that the radical set accords with the characteristic of Chinese character and on-line handwritten text.
关键词
人工智能 /
模式识别 /
连续字符识别 /
手写汉字部件 /
对数正态分布
{{custom_keyword}} /
Key words
artificial intelligence /
pattern recognition /
continuous character recognition /
handwritten Chinese character radical /
logarithmic normal distribution
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] N. Gorski, V. Anisimov, E. Augustin, D. Price, and J.-C. Simon1 A2 iA Check Reader: A Family of Bank Check Recognition Systems [J]. In 5th Int. Conference on Document Analysis and Recognition 99. Bangalore, India, pages 523 - 526, 1999.
[2] Sin B-K, Ha J Y, Oh S C, Kim J H, Network-based approach to online cursive script recognition[J] , IEEE Transaction on System, Man, and Cybernetics—PartB: Cybernetics, 1999, 29 (2) : 321 - 328.
[3] S. Marukatat, T. Artières, B. Dorizzi and P. Gallinari, Sentence Recognition Through Hybrid Neuro-Markovian Modeling [J] , In International Conference on Document Analysis and Recognition, Seattle, Washington, US. Pages 110 - 117, 2001.
[4] T. Artières, P. Gallinari, H .Li, S. Marukatat, and B. Dorizzi. From Character to Sentences: A Hybrid Neuro-Markovian System for On-line Handwriting Recognition [J] , http://www-connex.lip6.fr/~maruka/papers/abs_bunke.html, 2001.
[5] 姚正斌, 丁晓菁, 刘长松. 基于笔划合并和动态规划的联机汉字切分算法[J]. 清华大学学报(自然科学版) , 2004, 44 (10) : 1417 - 1421.
[6] Zhao Wei, Liu Jia-feng, Tang Xiang-long. An On-line Free Handwritten Chinese Character Recognition Method Based on Component Cascaded HMMs[J] , High Technology Letters, 2005 (4) , Vol.11: 301 - 305.
[7] 戴汝为, 郝红卫, 肖旭红. 汉字识别的系统与集成[M]. 浙江科学技术出版社, 1998, 254 - 278.
[8] 张炘中. 汉字识别技术[M]. 北京,清华大学出版社, 1992年出版, 16 - 24.
[9] 王宁, 崔永华,等. 信息处理用GB13000.1字符集汉字部件规范(国家语言文字工作委员会语言文字规范) [M]. 1998年4月出版, 6 - 16.
[10] 张学文. 组成论[M]. 合肥:中国科学技术大学出版社, 2003年出版, 187 - 189.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}