赛牙热·依马木,于斯音·于苏普,阿不都萨拉木·达吾提. 拉丁化维吾尔文字特征及其基于规则的正规化[J]. 中文信息学报, 2016, 30(3): 60-67.
Seyyare Imam, Hussein Yusuf, Abdusalam Dawut. Features of Latin Transcriptions of Uyghur Characters and Its Normalization Based on Rules. , 2016, 30(3): 60-67.
Features of Latin Transcriptions of Uyghur Characters and Its Normalization Based on Rules
Seyyare Imam1, Hussein Yusuf2, Abdusalam Dawut3
1. Instiute of Politics and Public Administration,Xinjiang University, Urumqi,Xinjiang 830046,China; 2. Instiute of Information Science and Enginerring,Xinjiang University, Urumqi,Xinjiang 830046,China; 3. School of Software,Xinjiang University,Urumqi,Xinjiang 830046,China
Abstract:A rule based normalization method for Latin transcriptions of Uyghur Characters popular in the WEB is presented. First, we establish the large scale text corpus including four different types of datasets, i.e.set of the fixed words, set of the word-initial letter sequences, set of the suffix letter sequences, and set of the special words. Then we normalize the Uyghur Latin transcriptions by the characteristics of the letter sequence within a word and context information of adjacent letters via the Minimum Edit Distance. Finally, a detailed analysis of the experiment results and the further researches are also given in this paper.