基于HMM的汉语文本识别后处理研究

李元祥,丁晓青,刘长松

PDF(328 KB)
PDF(328 KB)
中文信息学报 ›› 1999, Vol. 13 ›› Issue (4) : 30-35.
综述

基于HMM的汉语文本识别后处理研究

  • 李元祥,丁晓青,刘长松
作者信息 +

Post-processing Study of Chinese Document Recognition Based on HMM

  • Li Yuanxiang , Ding Xiaoqing , Liu Changsong
Author information +
History +

摘要

本文用HMM(Hidden Markov Model)描述汉语文本识别后处理,将汉语语言和单字识别这两个概率模型结合起来,以充分利用单字识别器提供的信息。语言模型的参数由语料库统计得到;单字识别模型的参数为条件概率,经理论分析,它可转化为后验概率来求解。在分析训练样本集单字识别结果的基础上,提出一种统计方法估计候选字的后验概率。HMM在脱机手写体汉语文本识别中的实验表明,后处理性能除取决于语言模型外,还取决于后验概率的精确估计。

Abstract

In this paper , a post-processing method using HMM(Hidden Markov Model) for Chinese document recognition is proposed. HMM combines language model with single character recognition(SCR) model to make the best of SCR output . The parameters of language model are derived from corpus , while the parameters of SCR model are conditional probabilities that can be converted into posterior probabilities by theoretic analysis. On the basis of SCR output analysis , posterior probabilities of candidates are obtained by statistical method. Experiments in off - line Chinese document recognition show that post - processing performance depends on efficient evaluation of posterior probability , besides proper language model.

关键词

汉字识别 / 后处理 / 语言模型 / 隐马尔可夫模型 / 后验概率

Key words

Chinese Character Recognition / Post-processing / N-gram Language Model / Hidden Markov Model / Posterior Probability

引用本文

导出引用
李元祥,丁晓青,刘长松. 基于HMM的汉语文本识别后处理研究. 中文信息学报. 1999, 13(4): 30-35
Li Yuanxiang , Ding Xiaoqing , Liu Changsong. Post-processing Study of Chinese Document Recognition Based on HMM. Journal of Chinese Information Processing. 1999, 13(4): 30-35

参考文献

[1] 夏莹. 基于统计的汉字文本自动后处理方法. 模式识别与人工智能,1996 ,9 (2)
[2] Lee H J et al . A Markov language model in handwritten chinese text recognition. Proceedings of 2nd ICDAR , Japan , 1993
[3] Tung C H et al . Increasing Character Recognition Accuracy by Detection and Correction of Erroneously Identified Characters. Pattern Recognition . 1994 , 27 (9)
[4] Lawrance R Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE ,1989 , 77 (2)
[5] 吴佑寿, 丁晓青. 汉字识别的原理、方法与实现. 北京:高等教育出版社,1992
[6] Lei Xu et al . Methods of Combining Multiple Classifiers and their applications to handwritten recognition. IEEE System , Man and Cybernetics ,1992 ,22 (3)
[7] 陈友斌. 非特定人脱机手写汉字识别方法的研究[博士学位论文] . 北京:清华大学,1997
[8] Jelinek F. Self-Organized Language Modeling for Speech Recognition. Reading on Speech Recognition , 1990 ,450~506
PDF(328 KB)

675

Accesses

0

Citation

Detail

段落导航
相关文章

/