李元祥,丁晓青,刘长松. 基于HMM的汉语文本识别后处理研究[J]. 中文信息学报, 1999, 13(4): 30-35.
Li Yuanxiang , Ding Xiaoqing , Liu Changsong. Post-processing Study of Chinese Document Recognition Based on HMM. , 1999, 13(4): 30-35.
基于HMM的汉语文本识别后处理研究
李元祥,丁晓青,刘长松
清华大学电子工程系
Post-processing Study of Chinese Document Recognition Based on HMM
Li Yuanxiang , Ding Xiaoqing , Liu Changsong
Department of Electronic Engineering , Tsinghua University
Abstract:In this paper , a post-processing method using HMM(Hidden Markov Model) for Chinese document recognition is proposed. HMM combines language model with single character recognition(SCR) model to make the best of SCR output . The parameters of language model are derived from corpus , while the parameters of SCR model are conditional probabilities that can be converted into posterior probabilities by theoretic analysis. On the basis of SCR output analysis , posterior probabilities of candidates are obtained by statistical method. Experiments in off - line Chinese document recognition show that post - processing performance depends on efficient evaluation of posterior probability , besides proper language model.
[1] 夏莹. 基于统计的汉字文本自动后处理方法. 模式识别与人工智能,1996 ,9 (2) [2] Lee H J et al . A Markov language model in handwritten chinese text recognition. Proceedings of 2nd ICDAR , Japan , 1993 [3] Tung C H et al . Increasing Character Recognition Accuracy by Detection and Correction of Erroneously Identified Characters. Pattern Recognition . 1994 , 27 (9) [4] Lawrance R Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE ,1989 , 77 (2) [5] 吴佑寿, 丁晓青. 汉字识别的原理、方法与实现. 北京:高等教育出版社,1992 [6] Lei Xu et al . Methods of Combining Multiple Classifiers and their applications to handwritten recognition. IEEE System , Man and Cybernetics ,1992 ,22 (3) [7] 陈友斌. 非特定人脱机手写汉字识别方法的研究[博士学位论文] . 北京:清华大学,1997 [8] Jelinek F. Self-Organized Language Modeling for Speech Recognition. Reading on Speech Recognition , 1990 ,450~506