才智杰,才让卓玛,. 藏文字形结构分布研究[J]. 中文信息学报, 2016, 30(4): 98-105.
CAI Zhijie, CAI Rangzhuoma,. Research on the Distribution of Tibetan Character Forms. , 2016, 30(4): 98-105.
Research on the Distribution of Tibetan Character Forms
CAI Zhijie1, CAI Rangzhuoma1,2
1. Key Laboratory of Tibetan information processing, Ministry of Education, Qinghai Normal University, Xining,Qinghai 810008,China; 2. College of Computer Science, Shaanxi Normal University, Xi’an, Shanxi 710062, China
Abstract:Researching on the distribution of Tibetan character forms is the foundation of Natural languages processing, provides a theoretical basis for word attribute analysis, input design, sorting, speech synthesis and character information entropy studies. This paper classified the Tibetan character forms into single-element character and combined-element character, and further classify the combined-element characer by their artifacts’ structures and numbers. This paper conducts statistical analysis of glyph structure from 85 million Tibetan words in 450M corpus containing, establishes distribution statistics of Tibetan glyph structure.
[1] 冯志伟.自然语言处理的形式模式[M].北京:中国科学技术大学出版社, 2010. [2] 陈玉忠,俞士汶.藏文信息处理的研究现状与展望[J].中国藏学,2003(4):97-107. [3] 俞敏洪.英语词汇速记大全[M].北京:世界知识出版社出版, 2000. [4] 赵志文.英语语法规律[M].吉林:延边大学出版社, 2002. [5] 恒星英语[DB/OL]. http://www.hxen.com/word/goucifa/,2013,10. [6] 苏培成.现代汉字学纲要[M]. 北京:北京大学出版社, 2001. [7] 中国语言文字网[DB/OL].http://www.china-language.gov.cn/wenziguifan/index.htm,2013,10. [8] 李乐强,唐常杰,左劼等.基于同现度和自学习的中文字符组合发现[J].计算机研究与发展,2007(z3):268-272. [9] 李世明,李铮,苑志伟等.基于搜索引擎的模糊字频统计[J].计算机工程与设计,2010,31(2):443-446. [10] 江荻,董颖红. 藏文信息处理属性统计研究[J].中文信息学报,1994,2(9):37-44. [11] 扎西次仁.《中华大藏经·丹珠尔》藏文对勘本字频统计分析[J].中国藏学,1997,(2):122-133. [12] 王维兰,陈万军.藏文字丁、音节频度及基信息熵[J].术语标准化与信息技术, 2004(2):27-31. [13] 高定国,龚育昌.现代藏文字全集的属性统计研究[J].中文信息学报,2005,19(1):71-75. [14] 才智杰,才让卓玛.基于语料库的藏文字属性分析系统设计[J].计算机工程,2011,37(22): 270-272. [15] CaiZhijie,CaiRangzhuoma.Statistical Analysis for Frequency of The Corpus-based Modern Tibetan Basic Components[J].The 4th International Conference on Intelligent Networks and Intelligent Systems,2011,11:214-217. [16] 艾金勇,于洪志,李永宏.藏文字形结构计量统计分析[J].计算机应用,2009,29 (07): 2029-2031. [17] 百度百科.藏文[EB/OL].http://baike.baidu.com/view/230052.htm,2013,01. [18] 才智杰.藏文自动切分系统中紧缩词的识别[J].中文信息学报,2009,23(1):35-37. [19] 才智杰,才让卓玛.班智达藏文标注词典设计[J].中文信息学报,2010,24(5):46-49. [20] 才让卓玛,才智杰.现代藏文字构件分解方法[J].青海大学学报(自然科学版),2010,28(4):83-86. [21] D A Huffman.A Method for the Construction of Minimum Redundancy Codes[C]//Proceedings of IRE.1952,40 (10):1098-1101. [22] C E Shannon.A mathematical theory of communication[C]//Proceedings of the ACM Sigmobile Mobile Computing and Communications Review.2001.