Ethnic Language and Cross Language Information Processing
Raxida Turhuntay, Wushour Slamu
2018, 32(8): 80-90.
The current Uyghur text sentiment classification method uses the unigram features obtained from space segmentation as a text representation, and is not able to mine the deep language phenomena related to emotional expressions. This paper, based on the word order dependence of Uyghur language, summarized several rules, extracted Bi-tagged features that can express rich emotional information, and classified Uyghur sentiment corpora with a support vector machine (SVM) classifier. Results indicated that, in the Uyghur text sentiment classification: (1) the Bi-tagged features performed optimal results when it contained all parts of speech rules presented in this paper; (2) the Bi-tagged features are able to extract rich emotional information and negative information as well; (3) in comparison to the results of unigram, bigram features and their combined features on the datasets in this paper, the combination of Bi-tagged and unigram features have lead to improved performances. Accordingly, the classification accuracy is 4.225% higher than that of the baseline accuracy used in this paper. Our results, therefore, further improved the classification efficiency of Uyghur text sentiment classification. In addition, the methods presented in this paper can also be applied as a reference for the sentiment classification of other closely related languages such as Kazakh and Kirgiz.