为了解决复杂背景下,文字风格多样导致整页文本识别模型识别精度低和网络难以收敛的问题,该文对基于迁移学习的整页识别算法(垂直注意力网络)进行改进。首先对《法国国家图书馆藏敦煌藏文文献》第一册的319张数据进行了构建和标注,通过印刷体合成等方式对数据集进行扩充,使实验数据达到2 367张图片;其次,为了增强行特征提取能力和加快网络收敛速度,使用自适应平均值池化对行特征提取模块和使用门循环单元对解码器进行了改进;最后将行训练模型迁移到改进的整页文本识别任务中实现对敦煌藏文文字的识别。实验结果表明,在拥有行级的标注情况下,使用迁移学习相比主流的整页识别模型降低了0.73%的字符错误率,验证了该模型在数据稀缺情况下对整页文本识别的有效性。
Abstract
In order to improve the whole page text recognition model in the complex background, this paper proposed a method based on migration learning (vertical attention network). In this study, 319 pieces of data in the first volume of Dunhuang Tibetan Documents in the National Library of France are collected and annotated, which is expanded up to 2367 pictures by means of print synthesis as the experimental dataset. Then the adaptive average pooling is used to improve the line feature extraction module and the gate loop unit is used to improve the decoder. This line training model is transferred to the improved full-page text recognition task to realize the recognition of Dunhuang Tibetan characters. The experimental results show that the proposed method reduces the character error rate by 0.73% compared with the mainstream full-page text recognition model.
关键词
文本识别 /
迁移学习 /
端到端整页识别
{{custom_keyword}} /
Key words
text recognition /
transfer learning /
end to end full-page text recognition
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 仁青吉.英、法所藏敦煌古藏文中观文献分类及其特点研究[J].藏学学刊,2021(01):17-31.
[2] 高定国.敦煌文献中藏文字形及书写特点的研究[J].西藏大学学报(社会科学版),2016,31(03):56-62.
[3] 王维兰,丁晓青,陈力,等.印刷体现代藏文识别研究[J].计算机工程,2003(03):37-38.
[4] 普次仁.藏文文字识别技术研究[D].拉萨: 西藏大学硕士学位论文,2008.
[5] 王华,丁晓青.多字体印刷藏文字符识别[J].中文信息学报,2003(06):47-52.
[6] 欧珠,普次仁,大罗桑朗杰,等.印刷体藏文文字识别技术研究[J].计算机工程与应用,2009,45(24): 165-169.
[7] HAN Y, WANG W, WANG Y, et al. Research on the method of tibetan recognition based on component location information[C]//Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision, 2018: 63-73.
[8] HUANG H, DA F. A database for off-line handwritten Tibetan character recognition[J]. Journal of Information & Computational Science, 2012, 9(18): 5987-5993.
[9] MA L, WU J. Semi-automatic Tibetan component annotation from online handwritten Tibetan character database by optimizing segmentation hypotheses[C]//Proceedings of the 12th International Conference on Document Analysis and Recognition. IEEE, 2013: 1340-1344.
[10] MA L, WU J. Online handwritten Tibetan syllable recognition based on component segmentation method[C]//Proceedings of the 13th International Conference on Document Analysis and Recognition. IEEE, 2015: 46-50.
[11] 王维兰,卢小宝,蔡正琦,等.基于部件组合的联机手写“藏文-梵文”样本生成[J].中文信息学报,2017, 31(05):64-73.
[12] 韩跃辉.藏文古籍识别系统的设计与实现[D].兰州: 西北民族大学硕士学位论文,2019.
[13] LI Z, WANG W. Tibetan historical document recognition of uchen script using baseline information[C]//Proceedings of the 10th International Conference on Graphics and Image Processing. SPIE, 2019, 11069: 997-1006.
[14] 黄婷.基于字丁的手写体藏文识别[D].西安: 西安电子科技大学硕士学位论文,2020.
[15] 仁青东主.基于深度学习的藏文古籍木刻本文字识别研究[D].拉萨: 西藏大学博士学位论文,2021.
[16] WIGINGTON C, TENSMEYER C, DAVIS B, et al. Start, follow, read: End-to-end full-page handwriting recognition[C]//Proceedings of the European Conference on Computer Vision, 2018: 367-383.
[17] YOUSEF M, BISHOP T E.OrigamiNet: Weakly-supervised, segmentation-free, one-step, full page text recognition by learning to unfold[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 14710-14719.
[18] WANG T, ZHU Y, JIN L, et al. Implicit feature alignment: Learn to convert text recognizer to text spotter[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 5973-5982.
[19] COQUENET D, CHATELAIN C, PAQUET T. SPAN: A simple predict & align network for handwritten paragraph recognition[C]//Proceedings of the International Conference on Document Analysis and Recognition. Springer, Cham, 2021: 70-84.
[20] COQUENET D, CHATELAIN C, PAQUET T. End-to-end handwritten paragraph text recognition using a vertical attention network[J]. Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(1): 508-524.
[21] 西北民族大学.法国国家图书馆藏敦煌藏文文献(一)[M].上海:上海古籍出版社,2006.
[22] 三排才让.敦煌藏文文献的字词属性统计研究[D].拉萨: 西藏大学硕士学位论文,2021.
[23] NIU S, LIU Y, WANG J, et al. A decade survey of transfer learning (2010—2020)[J]. IEEE Transactions on Artificial Intelligence, 2020, 1(2): 151-166.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(62166038);2021年度西藏自治区一流课程建设项目;西藏大学2022年20级硕士研究生高水平人才培养项目重点课题(2020-GSP-S182)
{{custom_fund}}