面向文本检测的藏文古籍文档图像超分辨率重建

郝玉胜,李健伟,王维兰,王筱娟,林强

中文信息学报 ›› 2024, Vol. 38 ›› Issue (10) : 53-62.
民族、跨境及周边语言信息处理

面向文本检测的藏文古籍文档图像超分辨率重建

  • 郝玉胜1,2,李健伟1,王维兰1,王筱娟1,林强1,2
作者信息 +

Text Detection Oriented Super Resolution Reconstruction of Tibetan Ancient Scripts

  • HAO Yusheng1,2, LI Jianwei1, WANG Weilan1, WANG Xiaojuan1, LIN Qiang1,2
Author information +
History +

摘要

针对藏文古籍文档图像普遍因低质、视觉效果不佳严重影响图像中文本区域的检测和识别问题,该文构建了一个藏文古籍文档图像超分辨率数据集TAMSRD,同时提出了一种基于卷积神经网络的超分辨率重建方法,为领域内藏文古籍文档图像的超分辨率重建问题提供了有意义的参考。该文所提模型在ICDAR 2013/2015/2017, MSRA_TD500和TAMSRD共5个数据集上的实验结果表明: ①该文所设计的超分辨率网络模型能够有效提高低质藏文古籍文档图像的视觉质量,重建图像的峰值信噪比(PSNR)、结果相似性指标(SSIM)以及自然图像质量评估指标(NIQE)都有明显的改善; ②超分辨率网络模型对低质藏文古籍文档图像的重建,能够大幅改善各类场景图像文本检测模型的性能。在各个数据集上,MSER方法的召回率和F值提升区间为[16.3%, 32.5%]和[13.3%, 41.9%],CTPN方法的召回率和F值提升区间为[4.1%, 39.8%]和[2.1%, 32.7%],DB方法的召回率和F值提升区间为[8.4%, 56.5%]和[7.7%, 58.7%]。

Abstract

In order to improve the poor visual effect affecting the detection or recognition of Tibetan text in ancient manuscripts, a super-resolution reconstruction architecture based on CNN is proposed in this paper. Meanwhile, a dataset named TAMSRD (Tibetan Ancient Manuscripts Super-resolution Dataset) is constructed. The experimental results on five datasets named ICDAR 2013/2015/2017, MSRA_TD500 and TAMSRD demonstrate the super-resolution architecture presented in this paper effectively enhances the visual quality of low-quality document images in terms of PSNR, SSIM and NIQE. And the reconstruction of low-quality document images using the proposed model significantly boosts the performance of text detection models across various scene images.

关键词

超分辨率 / 藏文古籍 / 文档图像 / 图像质量评价

Key words

super resolution / Tibetan ancient scripts / document image / image quality assessment.

引用本文

导出引用
郝玉胜,李健伟,王维兰,王筱娟,林强. 面向文本检测的藏文古籍文档图像超分辨率重建. 中文信息学报. 2024, 38(10): 53-62
HAO Yusheng, LI Jianwei, WANG Weilan, WANG Xiaojuan, LIN Qiang. Text Detection Oriented Super Resolution Reconstruction of Tibetan Ancient Scripts. Journal of Chinese Information Processing. 2024, 38(10): 53-62

参考文献

[1] 阿贵, 达瓦. 藏文文献典籍传承、保护及其数字化现状综述[J]. 西藏研究, 2017, (4): 137-145.
[2] 益西拉姆, 刘勇, 奔嘉. 西南民族大学·史密斯藏学文献馆的建馆历程[J]. 民族学刊, 2014, 5(22): 36-42.
[3] 徐丽华. 关于藏文古籍数字化的思考[J]. 中国藏学, 2011, (2): 153-158.
[4] 史桂玲. 藏文古籍的保护与开发利用[J]. 图书馆理论与实践, 2012, (10): 96-98.
[5] 德萨, 更尕易西. 网络环境下藏文文献资源共享模式研究[J]. 中国藏学, 2013, (2): 202-206.
[6] 嘎藏陀美, 扎西当知. 《法国国家图书馆藏敦煌藏文文献》(1-15册)目录[J]. 中国藏学, 2014, (S1): 175-206.
[7] YIBIN T, MING W. Adaptive deblurring for camera-based document image procesing[C]//Proceedings of the 5th International Symposium on Visual Computing. Las Vgeas, NV: Springer, 2009: 767-777.
[8] ANAND S C, PRITI P R. Contrast based enhancement of palm-leaf manuscript image[C]//Proceedings of the 2nd International Conference on Computer Engineering and Applications. Bali, Indonesia: IEEE, 2010: 219-223.
[9] NTIROGIANNIS K, GATOS B, PRATIKAKIS I. A combined approach for the binarization of handwritten document images[J]. Pattern Recognition Letters, 2014, 35: 3-15.
[10] XIAOYU L, BO Z, JING L, et al. Document rectification and illumination correction using a patch-based CNN[J]. ACM Transactions on Graphics, 2019, 38(6): 1-11.
[11] OLIVEIR D M, LINS R D, SILVA GDPE. Shading removal of illustrated documents[J]. Image Analysis and Recognition, 2013, 7950: 308-317.
[12] CHAO D, CHANGE L C, KAIMING H, et al. Learning a deep convolutional network for image super-resolution[C] //Proceedings of the European Conference on Computer Vision. Zurich: Springer, 2014: 184-199.
[13] LEDIG C, THEIS L, HUSZR F, et al. Photo-realistic single image super-resolution using a generative adversarial network[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 105-114.
[14] PENG J B, YI Y H, YU C H, et al. DBPNet: A dual-branch pyramid network for document super-resolution[J]. Patter Recognition Letters, 2023, 166: 80-88.
[15] KIM J, CHOE Y. Document image restore via SPADE-based super-resolution network[J]. Electronics, 2023, 12(3): 748-754.
[16] 候照. 敦煌藏文文献文字识别的预处理研究[D]. 拉萨: 西藏大学硕士学位论文, 2023.
[17] 卢玉琪. 藏文古籍文档图像超分辨率重建研究[D]. 兰州: 西北民族大学硕士学位论文, 2023.
[18] LEE S H, CHO M S, JUNG K, et al. Scene text extraction with edge constraint and text collinearity[C]//Proceedings of the 20th International Conference on Pattern Recognition. Istanbul, Trukey: IEEE, 2010: 3983-3986.
[19] TITIJAROONROJ T. Modified stroke width transform for Thai text detection[C]//Proceedings of the International Conference on Information Technology. Khon Kaen, Thailand: IEEE, 2018: 1-5.
[20] PAN Y F, HOU X W, LIU C L. A hybrid approach to detect and localize texts in natural scene images[J]. IEEE Transactions on Image Processing, 2011, 20(3): 800-813.
[21] HUIZHONG C, SAM S T, GEORG S, et al. Robust text detection in natural images with edge-enhanced maximally stable extremal regions[C]//Proceedings of the 18th International Conference on Image Processing. Brussels, Belgium: IEEE, 2011: 2609-2612.
[22] 才让当知, 黄鹤鸣, 范玉涛,等. 基于双注意力Yolov5的场景藏文检测[J]. 计算机工程与设计, 2023, 44(11): 3411-3419.
[23] LI J C, HAO Y S, WANG W L. Scene text detection based on expanding the text center region for bilingual Tibetan-Chinese[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2021, 35(33): 1-20.
[24] 候闫, 高定国, 高红梅. 乌金印刷多字体藏文的文本检测与识别[J]. 计算机工程与设计, 2023, 44(04): 1058-1065.
[25] 贡去卓么, 才让加, 三知加. 基于语义分割的藏文古籍文档文本区域检测[J]. 计算机仿真, 2022, 39(05): 448-454.
[26] 芷香香, 高定国. 手写多字体藏文古籍文本检测方法研究[J]. 高原科学研究, 2022, 6(02): 89-101.
[27] 高定国, 候闫, 高红梅,等. 乌梅印刷多字体藏文文本的检测与识别[J]. 高原科学研究, 2023, 7(01): 92-100.
[28] HE K M, ZHANG X Y, REN S Q, et al. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification[C]//Proceedings of IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1026-1034.
[29] MITTAL A, SOUNDARARAJAN R, BOVIK A C. Making a “completely blind” image quality analyzer[J]. IEEE Signal Processing Letters, 2013, 20(3): 209-212.

基金

国家自然科学基金(62166036);中央高校基本科研业务费(31920220132);甘肃省高等学校创新基金(2021B-067);教学部产学合作协同育人项目(202102383034);甘肃省科技计划项目(22JR5RA187);西北民族大学教育教学改革研究一般项目(2023XJYBJG-43)

215

Accesses

0

Citation

Detail

段落导航
相关文章

/