数学表达式定位是印刷体数学表达式识别的前提。针对中文科技文档,分别对独立表达式和内嵌表达式的定位问题提出了新的方法。采用自适应神经模糊推理系统(ANFIS) 对行特征进行分类,提取出独立表达式;采用模糊聚类和动态规划方法,从文档中依次提取出汉字、中文标点和英文字符,利用启发式规则合并剩余的数学符号而提取出内嵌表达式。实验表明,提出的表达式定位方法有很高的正确率。
Abstract
Extraction of mathematical expressions is the first step of mathematical expressions recognition. A new approach for separating both isolated and embedded expressions in printed Chinese technical document s is presented. After the features of text lines are ext racted , ANFIS is used to classify the text lines into two classes : lines of text and lines of isolated expressions. For embedded expressions , Fuzzy clustering and dynamic programming algorithm are applied to ext ract Chinese Characters , Chinese punctuations and English letters in sequence. Atlast , heuristic rules are used to merge mathematics into expressions. The methods proposed are proved to have high accuracy by experiment s.
关键词
人工智能 /
模式识别 /
数学表达式定位 /
自适应神经模糊推理系统 /
模糊聚类 /
中英文分离
{{custom_keyword}} /
Key words
artificial intelligence /
pattern recognition /
mathematical expressions ext raction /
ANFIS /
fuzzy clustering /
Chinese English separation
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1 ] H. J . Lee , J . S. Wang. Design of a mathematical ex2 pression recognition system [ A ] . In : Proceedings of 3rd International Conference on Document analysis and Recognition[C] . ICDAR’95 , Mont réal , Canada , 1995. 4642468.
[2 ] Richard J . Fateman. How to Find Mathematics on a Scanned Page [ R] . Technical Report , 1996.
[3 ] K. Inoue , R. Miyazaki , M. Suzuki. Optical Recognition of Printed Mathematical Document s [A] . In : Proceedings of the Third Asian Technology Conference in Mathematics[C] . Springer-Verlag , 1998. 2802289.
[4 ] A. Kacem , A. Belaid , M. Ben Ahmed. EX2 TRAFOR : automatic EXTRAction of mathematical FORmulas [ A ] . In : Proceedings of 5th International Conference on Document analysis and Recognition[C] . ICDAR’99 , Bangalore , India , 1999. 5272530.
[5 ] S. P. Chowdhury , S. Mandal , A. K. Das and B. Chanda. Automated Segmentation of Math2Zones from Document Images [ A ] . In : Proceedings of 7th International Conference on Document analysis and Recognition[ C ] . ICDAR’ 03 , Edinburgh , Scotland , 2003. 7552759.
[6 ] Utpal Garain , B. B. Chaudhuri , A. Ray Chaudhuri. Identification of Embedded Mathematical Expressions in Scanned Document s [A ] . In : Proceedings of 17th International Conference on Pattern Recognition [ C ] . ICPR’04 , Cambridge , United Kingdom , 2004. Volume 1 : 3842387.
[7 ] JyhShins Roger J ang. ANFIS : Adaptive-Network Based Fuzzy Inference System [J ] . IEEE Transaction on Systems , Man and Cybernetics. 1993 , 23 (3) . [ 8 ] 边肇祺, 张学工. 模式识别[M] . 北京: 清华大学出版社, 1999. 12.
[9 ] B. B. Chaudhuri , Utpal Garain. Automatic detection of italic , bold and all2capital words in document images [A] . In : Proceedings of 14th International Conference on Pattern Recognition[C] . ICPR’98 , Brisbane , Australia , 1998. Volume 1 : 6102612.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}