词典与机器学习方法相结合的维吾尔语文本情感分析

热西旦木·吐尔洪太;吾守尔·斯拉木;伊尔夏提·吐尔贡

PDF(2148 KB)
PDF(2148 KB)
中文信息学报 ›› 2017, Vol. 31 ›› Issue (1) : 177-183.
情感分析与社会计算

词典与机器学习方法相结合的维吾尔语文本情感分析

  • 热西旦木·吐尔洪太1,2,吾守尔·斯拉木1,伊尔夏提·吐尔贡1
作者信息 +

Uyghur Text Sentiment Analysis by Combining Lexical
Knowledge with Machine Learning Methods

  • Rexidanmu Tuerhongtai1,2,Wushour Silamu1,Yierxiati Tuergong1
Author information +
History +

摘要

随着互联网整体水平的提高,大量基于维吾尔文的网络信息不断建立,引起了对不同领域的信息进行情感倾向性分析的迫切需要。该文考虑到维吾尔文没有足够的情感训练语料和完整的情感词典,结合机器学习方法和词典方法的优点,构建一个分类器模型 LCUSCM(Lexicon-based and Corpus-based Uyghur Text Sentiment Classification Model),先用自己构建的维吾尔文情感词典对语料进行高质量的情感分类,分类过程中对词典进行递归扩充,再根据每条句子的情感得分,从词典分类的结果中选择一部分语料来训练一个分类器并改进第一步的分类结果。此方法的正确率比单独使用机器学习方法提高了9.13%, 比词典方法提高了1.82%。

Abstract

With the development of the Internet, a large number of online Uyghur texts appeared, which demands sentiment analysis for different applications. Considering there are not neither enough training data nor a complete sentiment lexicon for Uyghur sentiment analysis, this paper combines the Lexicon-based method with Corpus-based method, proposing a so-called LCUSCM (Lexicon-based and Corpus-based Uyghur Text Sentiment Classification Model). It first classifies the text by using a manual-built Uyghur sentiment dictionary, with the lexicon is enriched incrementally in this process.Then, the reliable classified sentences are selected to train a classifier so as to refine the results of the first step. The accuracy of the hybrid method increased 9.13% than using machine learning method, and 1.82% than the lexicon based method.

关键词

维吾尔文 / 情感词典 / 情感分析 / 机器学习

Key words

Uyghur / sentiment lexicon / sentiment analysis / machine learning

引用本文

导出引用
热西旦木·吐尔洪太;吾守尔·斯拉木;伊尔夏提·吐尔贡. 词典与机器学习方法相结合的维吾尔语文本情感分析. 中文信息学报. 2017, 31(1): 177-183
Rexidanmu Tuerhongtai;Wushour Silamu;Yierxiati Tuergong. Uyghur Text Sentiment Analysis by Combining Lexical
Knowledge with Machine Learning Methods. Journal of Chinese Information Processing. 2017, 31(1): 177-183

参考文献

[1] Alina Andreevskaia, Sabine Bergler. When Specialists and Generalists Work Together: Overcoming Domain Dependence in Sentiment Tagging[C]//Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. HLT, 2008: 290-298.
[2] Qiu L, Zhang W, Hu C, et al. SELC: A self-supervised model for sentiment classification[C]//Proceeding of the 18th ACM Conference on Information and Knowledge Management(CIKM).2009: 929-936.
[3] M Abdul-mageed, M T Diab, Toward building a large-scale Arabic sentiment lexicon[C]//Proceedings of the 6th International Global WordNet Conference. Matsue, Japan : 2012: 18-22.
[4] Steinberger J, Ebrahim M, et al. Creating sentiment dictionaries via triangulation[J]. Decision Support Systems, 2012, 53: 689-694.
[5] Rao D, Ravichandran D. Semi-Supervised Polarity Lexicon Induction[C]//Proceedings of the EACL2009.Morristown ACL, 2009: 675-682.
[6] Hatzivassiloglou V, McKeown K R. Predicting the semantic orientation of adjectives[C]//Proceedings of the EACL97. Morristown: ACL, 1997: 174-181.
[7] Wiebe J. Learning subjective adjectives from corpora[C]//Proceedings of the AAAI. Menlo Park: AAAI Press, 2000: 735-740.
[8] Turney P, Littman ML. Measuring praise and criticism: Inference of semantic orientation from association[J]. ACM Transactions on Information Systems, 2003,21(4): 315-346.
[9] Hu M, Liu B. Systems Mining and Summarizing Customer Reviews[C]//Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Washington,DC: ACM,2004: 168-177.
[10] 朱嫣岚,闵锦,周雅倩. 基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1): 14-20.
[11] Zhao Qing, Sun Ji-zhou, Yu Ce, et al. A paralleled large-scale astronomical cross-matching function[C]//Proceedings of Lecture Notes in Computer Science, vol 5574.2009: 604-614.
[12] Pang B, Lee L,Vaithyanathan S. Thumbs up: sentiment classification using machine learning techniques[C]//Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing-Volume 10,Stroudsburg,Association for Computational Linguistics,2002: 79-86.[13] Djorgovski S G, Brunner R J. Astronomical archives of the future: a virtual observatory[J]. Future Generation Computer Systems, 1999,16(1): 63-72.
[14] Gui Chen-zhou, Zhao Yong-heng.Worldwide R&D of virtual observatory[C]//Proceedings of the International Astronomical Union, 2007,3 : 563-564.
[15] Li T, Zhang Y, Sindhwani V. A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge[C]//Proceedings of the Joint Conference of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing(ACL-IJCNLP). 2009: 244-252.
[16] Melville P, Gryc W, Lawrence R D. Sentiment analysis of blogs by combining lexical knowledge with text classification[C]//Proceedings of the15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining(KDD). 2009: 1275-1284.
[17] He Y, Zhou D. Self-training from labeled features for sentiment analysis[J]. Information Processing and Management, 2011,47: 606-616.
[18] 力提甫·托乎提. 现代维吾尔语参考语法[M].北京: 中国社会科学出版社. 2012
[19] 李军,滕春华.现代维吾尔语否定范畴探析[J].语言与翻译,2001(2): 11-13.
[20] 吉力力·卡曼尔. 现代维吾尔语种情感动词初探[J]. 时代报告, 2013: 169-170.

基金

国家“973”重点基础研究计划基金(2014CB340506);国家自然科学基金(61363063);新疆大学多语种重点实验室开放课题(XJDX0905-2013-02)
PDF(2148 KB)

610

Accesses

0

Citation

Detail

段落导航
相关文章

/