藏文文字识别在藏文古籍文献、藏文办公自动化以及藏汉双语教育等领域具有非常重要的应用价值。作为两种常见的藏文字体之一,乌梅字体中笔画粘连和交错现象严重,导致识别难度较大。为此,该文提出了基于Rcnn+Char_SegNet的藏文乌梅长文本识别。首先,在CNN的每个卷积层中添加循环连接,增强CNN提取乌梅字粘连片段的特征和集成上下文信息的能力;其次,对提取的图像文本特征序列采用BiLSTM进行建模;最后,采用字丁切分模块增强CTC对图像序列和标签对齐的监督能力。在自行构建的Cursive Script-C517测试数据集上,该模型的最高准确率和平均准确率分别达到了99.80%和91.43%,分别比基线提高了1.45和48.47个百分点。此外,通过字符级词典库训练,使模型的训练时间减少了13.63%。实验表明,该方法有效解决了乌梅字体中笔画粘连和交错现象严重导致的识别错误问题,显著提升了印刷体藏文乌梅识别精度,减少了训练时间,且具有较好的鲁棒性。
Abstract
As one of the two common Tibetan fonts, Cursive Script font has serious stroke adhesion and interleaving, resulting in great difficulty in OCR. This paper proposes a method to recognize Cursive Script long text based on Rcnn+Char_SegNet. Firstly, recurrent connections are added to each layer of CNN to extract the features of Cursive Script word adhesion fragments and capture context information. Secondly, the extracted image text feature sequence is modeled by Bi-LSTM. Finally, the character segmentation module is used to enhance the ability of CTC module to supervise the image sequence and label alignment. On the self-constructed Cursive Script- C517 test database, the highest accuracy and average accuracy of the proposed model reach 99.80% and 91.43%, respectively, which are 1.45 and 48.47 percentage points higher than the baseline, respectively.
关键词
循环卷积神经网络 /
印刷体藏文识别 /
图像序列识别 /
印刷体藏文乌梅识别 /
藏文字丁切分
{{custom_keyword}} /
Key words
recurrent convolution neural network /
printed Tibetan recognition /
image sequence recognition /
printed Tibetan cursive script recognition /
Tibetan character segmentation
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 高定国. 敦煌文献中藏文字形及书写特点的研究[J]. 西藏大学学报,2016,31(03): 56-62.
[2] 王建新,王子亚,田萱. 基于深度学习的自然场景文本检测与识别综述[J]. 软件学报,2020,031(005): 1465-1496.
[3] GONG H W, XIANG W. Tibetan character recognition based on machine learning of k-means algorithm[C]//Proceedings of International Conference on Computer Modeling, Simulation and Algorithm, 2018: 350-352.
[4] Shi B, XIANG B, CONG Y. An end-to-end trainable neural network for image-based sequence recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2016, 39 (11): 2298-2304.
[5] 陈洋. 安卓平台上印刷体藏文识别软件的设计与实现[D]. 兰州: 西北民族大学硕士学位论文,2020.
[6] 公保杰. 藏文印刷体识别系统的研究与实现[D]. 西宁: 青海民族大学硕士学位论文,2019.
[7] 朱倩倩,车文刚,苗晗. 数字化藏文古籍中多样性字体的实现方法研[J]. 计算机工程与科学,2020,42(11): 2073-2079.
[8] 韩跃辉. 藏文古籍识别系统的设计与实现[D]. 兰州: 西北民族大学硕士学位论文,2019.
[9] 洪松. 自然场景下乌金体藏文检测与识别方法研究[D]. 拉萨: 西藏大学硕士学位论文,2021.
[10] 李金成. 藏汉双语自然场景文字检测与识别系统[D]. 兰州: 西北民族大学硕士学位论文,2021.
[11] 赵冬香. 基于ART2神经网络的脱机手写乌梅藏文基字识别[J]. 电子技术与软件工程,2013(15): 14.
[12] MING L, HU X. Recurrent convolutional neural network for object recognition[C]//Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition. IEEE Computer Society. Boston, MA, USA: 2015: 3367-3375.
[13] LITMAN R. ANSCHEL O. TSIPER S. et al. SCATTER: Selective context attentional scene text recognizer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA: 2020: 11959-11969.
[14] LI H, WANG P. SHEN C H, et al. Show, attend and read: A simple and strong baseline for irregular text recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, Hawaii, USA. 2019:8610-8617.
[15] 才智杰,才让卓玛,孙茂松.一种多基元联合训练的藏文词向量表示方法[J]. 中文信息学报,2020,34(05): 44-49.
[16] DAS A. LI J, ZHAO R. et al. Advancing connectionist temporal classification with attention modeling[C]//Proceedings of the IEEE International Conference on Acoustics. Canada, 2018: 4769-4773.
[17] SHI B. YANG M, WANG X. et al. ASTER: An attentional scene text recognizer with flexible rectification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(9): 2035-2048.
[18] 华旦扎西,才智杰,班玛宝. 一种基于TC_LSTM的藏文词拼写检查方法[J]. 中文信息学报,2020,34(05): 50-55.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
青海省科技计划项目(2017-GX-146);国家自然科学基金(62066039,62166034)
{{custom_fund}}