在对中、外文版面特点进行比较的基础上,指出了中文版面分析的困难所在,并有针对性地归纳出了相应的版面组合特征。利用这些特征,建立了一种以自底向上分析为主,同时融入自顶向下某些方法与结果的中文版面分析方法。实验结果表明,这种方法能够对比较规范的中文版面进行分析,具有较高的效率和较好的适应性。
Abstract
The main problems in Chinese page analysis are presented on the basis of the differences between Chinese and English layout . The Chinese layout comprehensive features are summarized with which a layout analysis method is built mainly based on a bottom-up approach. The results of experiment have shown that this method is able to analyse the standard Chinese layout . Compared with the existing approaches ,it is more efficient and suitable to process Chinese layout .
关键词
版面分析 /
文字识别 /
组合特征 /
连通区域
{{custom_keyword}} /
Key words
Layout analysis /
Character recognition /
Comprehensive features /
Connected area
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 林雁平、夏莹. 版面分解技术. 中国中文信息学会成立十周年学术报告会议论文集. 北京,1991 ,176~179
[2] O'Gorman L. The document spectrum for page layout analysis. IEEE Trans. on PAMI ,1993 ,15 (11) : 1162~1173
[3] Tsujimoto S ,Asada H. Major components of a complete text reading system. Proc. of the IEEE ,1992 ,80 (7) :1133~1149
[4] Fletcher L A ,Kasturi R. A robust algorithm for text string separation from mixed text/ graphics images. IEEE Trans. on PAMI ,1988 ,10 (6) :910~918
[5] 周长岭. 中文OCR中的版面分析算法初探. 第六届全国汉字识别学术会议论文集. 重庆,1996 ,137~142
[6] Akiyama T. Automated entry system for printed documents. Pattern Recognition ,1990 ,23 (11) :1141~1154
[7] Hinds S C ,Fisher J L ,D'Amato D P. A document skew detection method using run - length encoding and the Hough transform. Proc. 10th Int. Conf. on Pattern Recognition (ICPR) . Atlantic City. 1990 ,464~468
[8] Le D S ,Thoma G R ,Wechsler H. Automated page orientation and skew angle detection for binary document images. Pattern Recognition ,1994 ,27 (10) :1325~1344
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}