融合字符结构特征的多任务老挝语文字识别研究

陈琢,周兰江,郝永彬,张建安

PDF(4537 KB)
PDF(4537 KB)
中文信息学报 ›› 2023, Vol. 37 ›› Issue (4) : 34-44.
民族、跨境及周边语言信息处理

融合字符结构特征的多任务老挝语文字识别研究

  • 陈琢1,周兰江1,郝永彬1,张建安2
作者信息 +

A Multi-task Approach to Lao Character Recognition with Structural Features

  • CHEN Zhuo, ZHOU Lanjiang, HAO Yongbin, ZHANG Jian'an
Author information +
History +

摘要

老挝语属于资源稀缺型语言,直接从互联网中获取老挝语文本语料较为困难,老挝语文字识别研究可在有限的图片文本资源中获取更多的老挝语文本语料。在开展老挝文字光学字符识别的研究工作中,针对老挝单字符误切分、上/下位元音以及音调识别位置存在偏差和相似老挝字符的识别问题,该文研究了老挝字符书写等级和下位辅音,提出一种有效融合老挝字符结构特征的多任务字符识别方法。首先,利用深度残差网络提取字符图片中的老挝字符结构特征,通过边框回归矫正单字符包围框;其次,将已矫正切分结果和提取的字符特征作为联合输入,通过双向长短时记忆网络预测老挝字符序列,利用连接主义时间分类对预测结果进行序列对齐;最后,根据老挝字符固定组合优化模型预测结果。实验结果表明: 该方法可以精确识别已切分的老挝字符序列,字符错误率指标低至13.06%。

Abstract

Focused on the Optical Character Recognition of Lao script, this paper investigates the problems of Lao characters mis-segmentation, the misperception of hypernym/hyponym vowels and tone, and the confusion of similar Lao characters. According to the writing scheme and the hypo consonant of Lao characters, this paper proposes a multitasking character recognition to effectively integrate the structural features of Lao characters. The model extracts the structural features of Lao characters from character pictures via Deep Residual Network, and corrects the single character bounding box through Bounding Box Regression. Then, the corrected segmentation results and extracted character features are input jointly into Bi-directional Long-Short Term Memory network to identify the Lao character sequence, and the sequence alignment is completed by the Connectionist Temporal Classification. Finally, the result is predicted by the fixed combinatorial optimization model of Lao characters. The experimental result shows the method can reduce the Character Error Rate to 13.06%.

关键词

老挝印刷字符识别 / 老挝字符结构特征 / 多任务识别 / 端到端模型

Key words

Lao printed characters’ recognition / Lao characters' structural features / multi-task recognition / end-to-end model

引用本文

导出引用
陈琢,周兰江,郝永彬,张建安. 融合字符结构特征的多任务老挝语文字识别研究. 中文信息学报. 2023, 37(4): 34-44
CHEN Zhuo, ZHOU Lanjiang, HAO Yongbin, ZHANG Jian'an. A Multi-task Approach to Lao Character Recognition with Structural Features. Journal of Chinese Information Processing. 2023, 37(4): 34-44

参考文献

[1] HU X, PENG J, WANG, M, et al. A printed Chinese character recognition method[C]//Proceedings of the International Conference on Computer Science and Service System, 2011: 2904-2907.
[2] RUSU A, GOVINDARAJU V. CAPTCHA: Using the difference in the abilities of humans and machines in reading handwritten words[C]//Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition - Handwritten,2004: 226-231.
[3] SANKARAN N, JAWAHAR C V. Recognition of printed devanagari text using BLSTM neural network[C]//Proceedings of the 21st International Conference on Pattern Recognition. IEEE, 2012: 322-325.
[4] LING W, LUS T, MARUJO L, et al. Finding function in form: Compositional character models for open vocabulary word representation[C]//Proceedings of the EMNLP, 2015:1899-1907.
[5] SHI Y, FAN W, SHI G, The research of printed character recognition based on neural network[C]//Proceedings of the 4th International Symposium on Parallel Architectures, Algorithms and Programming, Tianjin, 2011.
[6] HOSSAIN S K A,TABASSUM T. Neural net based complete character recognition scheme for Bangla printed text books[C]//Proceedings of the 16th Int'l Conf. Computer and Information Technology. IEEE, 2014: 71-75.
[7] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016: 770-778.
[8] SHI B, BAI X, YAO C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11):2298-2304.
[9] SARSHOGH M R, HINES K. A multi-task network for localization and recognition of text in images[C]//Proceedings of the International Conference on Document Analysis and Recognition. IEEE, 2019: 494-501.
[10] PLANK B,SGAARD A, GOLDBERG Y. Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,2016: 412-418.
[11] AIT-MOHAND K, PAQUET T, RAGOT N. Combining structure and parameter adaptation of HMMs for printed text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(9): 1716-1732.
[12] DRUP N, ZHAO D, REN P, et al. Study on Printed Tibetan Character Recognition[C]//Proceedings of the International Conference on Artificial Intelligence and Computational Intelligence, 2010: 280-285.
[13] 王华,丁晓青. 多字体印刷藏文字符识别[J]. 中文信息学报,2003,17(6): 48-53.
[14] YAMINA O J, El Mamoun M, KADDOUR S. Printed Arabic optical character recognition using support vector machine[C]//Proceedings of the International Conference on Mathematics and Information Technology. IEEE, 2017: 134-140.
[15] SMITH R,ANTONOVA D, LEE D S. Adapting the tesseract open source OCR engine for multilingual OCR[C]//Proceedings of the International Workshop on Multilingual OCR. 2009: 1-8.
[16] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Communication of the ACN,2017,60(6): 84-90.
[17] LECUN Y, BOTTOU L. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[18] IRFAN B, AHMAD A, S A M A, G A F B. Open-vocabulary recognition of machine-printed Arabic text using hidden Markov models[J]. Pattern Recognition, 2016(51): 97-111.
[19] WANG T, WU D J, COATES A,et al. End-to-end text recognition with convolutional neural networks[C]//Proceedings of the 21st International Conference on Pattern Recognition, Tsukuba, 2012: 3304-3308.
[20] SHI B, WANG X, LYU P, et al. Robust scene text recognition with automatic rectification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 4168-4176.
[21] SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-V4, inception-resnet and the impact of residual connections on learning[C]//Proceedings of the 31st AAAI Conference on Artificial Intelgence,2017: 4278-4284.
[22] LY N T, NGUYEN C T, NAKAGAWA M. An attention-based end-to-end model for multiple text lines recognition in japanese historical documents[C]//Proceedings of the International Conference on Document Analysis and Recognition. IEEE, 2019: 629-634.
[23] LIAO M, LYU P, HE M, et al. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021,43(2): 532-548.
[24] YU W, LU N, QI X, et al. PICK: Processing key Information extraction from documents using improved graph learning-convolutional networks[C]//Proceedings of the 25th International Conference on Pattern Recognition, 2020: 4363-4369.
[25] RAY A,RAJESWAR S, CHAUDHURY S. Text recognition using deep BLSTM networks[C]//Proceedings of the 8th International Conference on Advances in Pattern Recognition. IEEE, 2015: 1-6.
[26] AGGARWAL N, KARL W. C. Line detection in images through regularizedhough transform[J]. IEEE Transactions on image processing, 2005,15(3): 582-591. March 2006, doi: 10.1109/TIP.2005.863021.
[27] GRAVES A, FERNNDEZ S, GOMEZ F, et al. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the 23rd International Conference on Machine Learning, 2006: 369-376.
[28] ZHANG L M. Practical grammar of Lao language[M]. Foreign Language Teaching and Research Press, 2001.
[29] SABIR E, RAWLS S, NATARAJAN P. Implicit language model in LSTM for OCR[C]//Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition. IEEE, 2017, 7: 27-31.
[30] BV D, RG B, HANGARGE M. Printed and Handwritten kannada numerals recognitionusing directional stroke and directional density with KNN[J]. International Journal of Machine Intelligence, 2011,3(3): 121-125.
[31] HUANG G, LIU Z, VAN DERMAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 4700-4708.
[32] HE T, HUANG W, QIAO Y, et al. Text-attentional convolutional neural network for scene text detection[J]. IEEE Transactions on Image Processing, 2016, 25(6): 2529-2541.
[33] 何力,周兰江,周枫等. 基于双向长短期记忆神经网络的老挝语分词方法[J]. 计算机工程与科学,2019,41(07): 1312-1317.

基金

国家自然科学基金(61662040)
PDF(4537 KB)

Accesses

Citation

Detail

段落导航
相关文章

/