基于HMM的满文文本识别后处理的研究

赵骥,李晶皎,王丽君,张继生

PDF(224 KB)
PDF(224 KB)
中文信息学报 ›› 2006, Vol. 20 ›› Issue (4) : 65-69.

基于HMM的满文文本识别后处理的研究

  • 赵骥1,李晶皎2,王丽君1,张继生1
作者信息 +

Research on the Post-processing of Manchu Character Recognition Based on Hidden Markov Model

  • ZHAO JI1,LI Jing-jiao2,WANG Li-jun1,ZHANG Ji-sheng1
Author information +
History +

摘要

将满文单词识别系统的识别信息和满文的词组信息有机的结合起来,建立满文词组和待定词集统计信息库,采用基于统计的隐马尔可夫模型的方法,依据贝叶斯准则,综合满文待定词的后验概率和词组的先验概率信息,建立合理有效便于实现的数据结构,采用动态规划法对满文单词识别系统输出存在的拒识词和错识词进行检测和纠正,从而有效的提高满文文本识别系统的识别率。实验表明:后处理性能除取决于语言模型外,还取决于概率的精确估计。另外,在单词识别系统识别率高的情况下,后处理的纠错能力会增强。

Abstract

The study proposes a post-processing method to improve the performance of Manchu character recognition. A evaluation model based on the Bayes rule are used to estimate the probability of the candidate Manchu words, which takes both the posterior probability of candidate and the prior probability of Manchu phrases into account. A Hidden Markov Model and Viterbi dynamic programming algorithm are adopted to check the output of the character recognition and to correct the rejected and mistaken words. This efficiently enhances the recognition rate of Manchu manuscript. The results indicate that the post-processing performance depends on the language model and the accuracy of the evaluation model. Additionally, a higher recognition precision of SCR (Single Character Recogniton) will yield a better performance of error correction of post-processing.

关键词

计算机应用 / 中文信息处理 / 满文 / 后处理 / 模糊矩阵 / 贝叶斯准则 / 特征矢量

Key words

computer application / Chinese information processing / Manchu / post-processing / confusion matrix / Bayes rules / features vector

引用本文

导出引用
赵骥,李晶皎,王丽君,张继生. 基于HMM的满文文本识别后处理的研究. 中文信息学报. 2006, 20(4): 65-69
ZHAO JI,LI Jing-jiao,WANG Li-jun,ZHANG Ji-sheng. Research on the Post-processing of Manchu Character Recognition Based on Hidden Markov Model. Journal of Chinese Information Processing. 2006, 20(4): 65-69

参考文献

[1] 张俐,胡明函,李晶皎,等. 满汉计算机辅助翻译系统的满文字符编码[J]. 东北大学学报(自然科学版) , 2002, 23 (2) : 119 - 122.
[2] 张广渊,李晶皎,张俐. 满文罗马转写与圈点满文转换算法的实现[J]. 东北大学学报(自然科学版) , 2003, 24 (12) : 1157 - 1160.
[3] Chang J. S. , Chen S. D. , The Post-processing of Optical Character Recognition Based on Statistical Noisy Channel and Language Model[J]. Proceedings of PACLIC, 1995: 127 - 13.
[4] 王维兰,丁晓青,戴玉刚. 藏文识别后处理的研究[J]. 术语标准化与信息技术, 2002, 2: 30 - 34.
[5] 刘家锋,黄健华,唐降龙. 基于HMM的联机汉字识别系统及其改进的训练方法[J]. 中文信息学报, 2000, 15 (4) : 47 - 52.
[6] Guo Q. , Zheng F. , Wu J. , Et al. A New Method Used in HMM for Modeling Frame Correlation[J]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’99) , 1999: 169 - 172.
[7] 蔡樱,盛立东. 手写文稿识别的一种后处理方法及系统集成[J]. 中文信息学报. 1999, 14 (3) : 30 - 36.
[8] Lin X. F. , Ding X. Q. , Chen M. , Et al. Adaptive confidence transform based classifier combination for Chinese character recognition[J]. Pattern Recogn. Lett. 1998, 19 (10) : 975 - 988.
[9] 李元祥,丁晓青,刘长松. 基于HMM的汉语文本识别后处理的研究[J]. 中文信息学报. 1999, 13 (4) : 29 - 34.
[10] Wong P. K. , Chan C. Post-processing statistical Language models for a handwritten Chinese character recognizer. [J]. IEEE Trans. Syst. Man Cybern. 1999. 29 (2) : 286 - 291.

基金

辽宁省自然科学基金资助项目(2001113)
PDF(224 KB)

775

Accesses

0

Citation

Detail

段落导航
相关文章

/