中文科技文档中的数学表达式定位

张志伟,孔凡让,刘维来,龙潜,刘永斌

PDF(169 KB)
PDF(169 KB)
中文信息学报 ›› 2007, Vol. 21 ›› Issue (4) : 86.
论文

中文科技文档中的数学表达式定位

作者信息 +

Extraction of Mathematical Expressions in Printed Chinese Technical Documents

Author information +
History +

摘要

数学表达式定位是印刷体数学表达式识别的前提。针对中文科技文档,分别对独立表达式和内嵌表达式的定位问题提出了新的方法。采用自适应神经模糊推理系统(ANFIS) 对行特征进行分类,提取出独立表达式;采用模糊聚类和动态规划方法,从文档中依次提取出汉字、中文标点和英文字符,利用启发式规则合并剩余的数学符号而提取出内嵌表达式。实验表明,提出的表达式定位方法有很高的正确率。

Abstract

Extraction of mathematical expressions is the first step of mathematical expressions recognition. A new approach for separating both isolated and embedded expressions in printed Chinese technical document s is presented. After the features of text lines are ext racted , ANFIS is used to classify the text lines into two classes : lines of text and lines of isolated expressions. For embedded expressions , Fuzzy clustering and dynamic programming algorithm are applied to ext ract Chinese Characters , Chinese punctuations and English letters in sequence. Atlast , heuristic rules are used to merge mathematics into expressions. The methods proposed are proved to have high accuracy by experiment s.

关键词

人工智能 / 模式识别 / 数学表达式定位 / 自适应神经模糊推理系统 / 模糊聚类 / 中英文分离

Key words

artificial intelligence / pattern recognition / mathematical expressions ext raction / ANFIS / fuzzy clustering / Chinese English separation

引用本文

导出引用
张志伟,孔凡让,刘维来,龙潜,刘永斌. 中文科技文档中的数学表达式定位. 中文信息学报. 2007, 21(4): 86
ZHANG Zhi-wei,KONG Fan-rang,LIU Wei-lai,LONG Qian,LIU Yong-bin. Extraction of Mathematical Expressions in Printed Chinese Technical Documents. Journal of Chinese Information Processing. 2007, 21(4): 86

参考文献

[1 ]  H. J . Lee , J . S. Wang. Design of a mathematical ex2 pression recognition system [ A ] . In : Proceedings of 3rd International Conference on Document analysis and Recognition[C] . ICDAR’95 , Mont réal , Canada , 1995. 4642468.
[2 ]  Richard J . Fateman. How to Find Mathematics on a Scanned Page [ R] . Technical Report , 1996.
[3 ]  K. Inoue , R. Miyazaki , M. Suzuki. Optical Recognition of Printed Mathematical Document s [A] . In : Proceedings of the Third Asian Technology Conference in Mathematics[C] . Springer-Verlag , 1998. 2802289.
[4 ]  A. Kacem , A. Belaid , M. Ben Ahmed. EX2 TRAFOR : automatic EXTRAction of mathematical FORmulas [ A ] . In : Proceedings of 5th International Conference on Document analysis and Recognition[C] . ICDAR’99 , Bangalore , India , 1999. 5272530.
[5 ]  S. P. Chowdhury , S. Mandal , A. K. Das and B. Chanda. Automated Segmentation of Math2Zones from Document Images [ A ] . In : Proceedings of 7th International Conference on Document analysis and Recognition[ C ] . ICDAR’ 03 , Edinburgh , Scotland , 2003. 7552759.
[6 ]  Utpal Garain , B. B. Chaudhuri , A. Ray Chaudhuri. Identification of Embedded Mathematical Expressions in Scanned Document s [A ] . In : Proceedings of 17th International Conference on Pattern Recognition [ C ] . ICPR’04 , Cambridge , United Kingdom , 2004. Volume 1 : 3842387.
[7 ]  JyhShins Roger J ang. ANFIS : Adaptive-Network Based Fuzzy Inference System [J ] . IEEE Transaction on Systems , Man and Cybernetics. 1993 , 23 (3) . [ 8 ]  边肇祺, 张学工. 模式识别[M] . 北京: 清华大学出版社, 1999. 12.
[9 ]  B. B. Chaudhuri , Utpal Garain. Automatic detection of italic , bold and all2capital words in document images [A] . In : Proceedings of 14th International Conference on Pattern Recognition[C] . ICPR’98 , Brisbane , Australia , 1998. Volume 1 : 6102612.
PDF(169 KB)

769

Accesses

0

Citation

Detail

段落导航
相关文章

/