Feature Polymeric Topology Model for Short-Text Sentiment Classification
HU Yang1,FENG Xupeng2,HUANG Qingsong1,3,FU Xiaodong1,LIU Li1,LIU Lijun1
1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming,Yunnan 650500,China; 2. Educational Technology and Network Center, Kunming University of Science and Technology, Kunming,Yunnan 650500,China; 3. Yunnan Key Laboratory of Computer Technology Applications, Kunming,Yunnan 650500,China
Abstract:Short-text has some peculiarities: extreme sparsity, disperse features and so on, which leads to inferior sentiment classification on short-text. To solve this problem, we propose the feature polymeric topology model for short-text sentiment classification. The model integrates mutual information among features, similarity of sentiment orientation and topic ascription difference into the sentiment features correlation. Then this correlation is employed to establish topology polymeric graph, in which the strongly connected components are assumed as the most similar sentiment features. Finally, the polymeric topology model supplements the training feature set with similar features from the unlabeled corpora, and reduces dimension of training space at same time. In experiment,the proposed model can improve the presicion and recall by 0.03 and 0.027, respectively.
[1] ASitaram, A Huberman. Predicting the Future With Social Media[C]//Proceedings of ACM, 2010. [2] Pang B, Lee L,Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques[C]//Proceedings of EMNLP-02, 2002: 79-86. [3] Ni XC,Xue GR, Ling X, Yu Y, Yang Q. Exploring in the weblog space by detecting informative and affective articles[C]//Proceedings of the 16th Int'l Conf. on World Wide Web. Banff: ACM Press, 2007: 281-290. [4] Mullen T, Collier N. Sentiment analysis using support vector machines with diverse information sources[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Barcelona: Association for Computational Linguistics, 2004: 412-418. [5] Whitelaw C, Garg N,Argamon S. Using appraisal groups for sentiment analysis[C]//Proceedings of the 14th ACM Int'l Conf. on Information and Knowledge Management. Bremen: ACM Press, 2005: 625-631. [6] 肖永磊, 刘盛华, 刘悦, 等. 社会媒体短文本内容的语义概念关联和扩展[J]. 中文信息学报, 2014, 28(4): 21-28. [7] 杨震, 赖英旭, 段立娟, 等. 基于上下文重构的短文本情感极性判别研究[J]. 自动化学报, 2012, 38(1): 55-67. [8] Xia H, Nan S, Chao Z, et al. Exploiting internal and external semantics for the clustering of short texts using world knowledge[C]//Proceedings of the 18th ACM Conference on Information and Knowledge Management. Hong Kong: ACM, 2009: 919-928. [9] HXianpei, S Le, Z Jun. Collective Entity Linking in Web Text: A Graph-Based Method[C]//Proceedings of the SIGIR, 2011. [10] BSriram, David Fuhry, Engin Demir, et al. Short Text Classification in Twitter to Improve Information Filtering[C]//Proceedings of SIGIR'10. Geneva, Switzerland, 2010. [11] Park H, Jeon M, Rosen J B. Lower dimensional representation of text data based on centroids and least squares[J]. Bit Numerical Mathematics, 2003, 43(2): 427-448. [12] Xu W R, Liu D X,Guo J, et al. Supervised dual-PLSA for personalized SMS filtering[C]//Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology. Sapporo, Japan: Springer-Verlag, 2009, 254-264. [13] 刘全超, 黄河燕, 冯冲. 基于多特征微博话题情感倾向性判定算法研究[J]. 中文信息学报, 2014, 28(4): 123-131. [14] 王蒙, 林兰芬, 王锋. 基于伪相关反馈的短文本扩展与分类[J]. 浙江大学学报(工学版), 2014, 48(10): 1835-1842. [15] 桂斌,杨小平,朱建林等.基于意群划分的中文微博情感倾向分析研究[J].中文信息学报,2015,29(3): 100-105. [16] Turney P D. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002: 417-424. [17] 李素科, 蒋严冰, 基于情感特征聚类的半监督情感分类[J]. 计算机研究与发展, 2013, 50(12): 2570-2577. [18] 程南昌, 侯敏, 滕永林. 基于文本特征的短文本倾向性分析研究[J]. 中文信息学报, 2015, 29(2): 163-169. [19] 高凯,李思雨,阮冬茹等.基于微博的情感倾向性分析方法研究[J].中文信息学报,2015,29(4): 40-49. [20] Turney P, Littman M L. Measuring praise and criticism: Inference of semantic orientation from association [J]. ACM Transansaction on Information Systems, 2003, 21(4): 315-346. [21] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation [J]. Journal of Machine Learning Research, 2003, 3(3): 993-1022. [22] Gabow H N. Path-based depth-first search for strong and biconnected components[J]. Information Processing Letters, 2000: 107-114. [23] Kullback S, Leibler R A. On information and sufficiency [J]. Annals of Mathematical Statistics, 1951, 22(1): 79-86.