基于部件组合的联机手写“藏文—梵文”样本生成

王维兰,卢小宝,蔡正琦,沈文韬,付吉,才科扎西

PDF(7971 KB)
PDF(7971 KB)
中文信息学报 ›› 2017, Vol. 31 ›› Issue (5) : 64-73.
民族语言与周边语言信息处理

基于部件组合的联机手写“藏文—梵文”样本生成

  • 王维兰1,卢小宝2,蔡正琦1,沈文韬1,付吉1,才科扎西1
作者信息 +

Online Handwritten Sample Generated Based on Component Combination for Tibetan-Sanskrit

  • WANG Weilan1, LU Xiaobao2, CAI Zhengqi1, SHEN Wentao1, FU Ji1, CAIKE Zhaxi1
Author information +
History +

摘要

“藏文—梵文”包括500多个现代藏文、6 000多个梵音藏文,在文字识别领域属于大类别的字符集,所以联机手写样本采集是庞大而复杂的工程。鉴于此,提供了一种基于部件组合的“藏文—梵文”手写样本生成方法,主要包括: (1)确定“藏文—梵文”字符集和部件集;(2)获取“藏文—梵文”字丁的部件位置信息;(3)采集联机手写“藏文—梵文”部件的样本;(4)生成联机手写“藏文—梵文”字符集样本库。该文为联机手写“藏文—梵文”识别的研究提供字符训练样本库和测试样本库,提高了手写梵音藏文样本采集效率,解决了样本数量及多样性问题,降低了样本采集成本,为进一步联机手写梵音藏文识别的研究与系统开发奠定了基础。

Abstract

Tibetan-Sanskrit includes more than 500 Tibetan characters, and more than 6000 Sanskrit. Belonging to the large class of character set, the sample collection of the online handwritten is a large and complex project. We present an online handwriting character sample generation method based on component combination for Tibetan-Sanskrit. The proposed method includes four main parts: (1) to determine the Tibetan-Sanskrit character set and component set; (2) to get location information of Tibetan-Sanskrit characters; (3) to collect online handwritten sample of component set for Tibetan-Sanskrit; and (4) to generate sample database of online handwritten Tibetan-Sanskrit character set. This provides the character's training sample set and test sample set for online handwritten Tibetan-Sanskrit.

关键词

联机手写 / 藏文—梵文 / 字符集 / 部件组合 / 样本生成

Key words

online handwritten / Tibetan-Sanskrit / character set / component combination / sample generation

引用本文

导出引用
王维兰,卢小宝,蔡正琦,沈文韬,付吉,才科扎西. 基于部件组合的联机手写“藏文—梵文”样本生成. 中文信息学报. 2017, 31(5): 64-73
WANG Weilan, LU Xiaobao, CAI Zhengqi, SHEN Wentao, FU Ji, CAIKE Zhaxi. Online Handwritten Sample Generated Based on Component Combination for Tibetan-Sanskrit. Journal of Chinese Information Processing. 2017, 31(5): 64-73

参考文献

[1] ISO/IEC 10646-1:Tibetan Character Collection[S]. ISO/IEC JTC1/SC2/WG2, 2000.
[2] 国家质量技术监督局. GB 22323—2008 信息技术藏文编码字符集(基本集及扩充集A)[S]. 北京:中国标准出版社, 2008.
[3] 国家质量技术监督局. GB/T 25913—2010 信息技术 藏文编码字符集(扩充集B)[S]. 北京:中国标准出版社, 2010.
[4] 王维兰, 丁晓青, 祁坤钰. 藏文识别中相似字丁的区分研究[J]. 中文信息学报, 2002, 16(4):60-65.
[5] 王华, 丁晓青. 多字体印刷藏文字符识别[J]. 中文信息学报, 2003, 17(6):47-52.
[6] 丁晓青, 王华, 刘长松, 等. 多字体多字号印刷体藏文字符识别方法[D]. ZL200410034107. 4, 2004.
[7] 热依曼·吐尔逊, 吾守尔·斯拉木. 一种维吾尔语联机手写识别系统[J]. 中文信息学报, 2014, 28(3):112-115.
[8] 刘卫, 李和成. 基于多模板归一化的维吾尔文字母识别算法[J]. 中文信息学报, 2016, 30(1):156-161.
[9] Huang Heming, Da Feipeng, Hang Xiaoxu. Wavelet transform and gradient direction based feature extraction method for off-line handwritten Tibetan letter recognition[J]. Journal of Southeast University, 2014(1):27-31.
[10] 王维兰, 钱建军, 多杰卓玛, 等. 一种联机手写藏文字符的识别方法[P]. 中华人民共和国国家知识版权局. ZL200910128595. 8, 2011.
[11] Ma L L, Wu J. Semi-automatic Tibetan component annotation from online handwritten Tibetan character database by optimizing segmentation hypotheses[C]//Proceedings of the International Conference on Document Analysis & Recognition, 2013:1340-1344.
[12] Ma L L, Wu J. A Tibetan component representation learning method for online handwritten Tibetan character recognition[C]//Proceedings of the International Conference on Frontiers in Handwriting Recognition, 2014:317-322.
[13] Ma L L, Wu J. Online handwritten Tibetan syllable recognition based on component segmentation method[C]//Proceedings of the International Conference on Document Analysis & Recognition, 2015:46-50.
[14] Wang Dahan, Liu Chenglin, Yu Jinlun, et al. CASIA-OLHWDB1:A database of online handwritten Chinese characters[C]//Proceedings of the 10th International Conference on Document Analysis and Recognition, 2009:1206-1210.
[15] Jin L, Gao Y, Liu G, et al. SCUT-COUCH2009-A comprehensive online unconstrained Chinese handwriting database and benchmark evaluation[J]. International Journal on Document Analysis and Recognition, 2011, 14(1):53-64.

基金

国家自然科学基金(61375029);国家民委领军人才计划;西北民族大学中央高校基本科研业务费专项资金(31920170142)。
PDF(7971 KB)

Accesses

Citation

Detail

段落导航
相关文章

/