针对现有微博主客观分类方法特征冗余度高和未考虑特征选择方法之间的互补关系问题,该文提出了一种基于融合特征的微博主客观分类方法。通过对多种不同特征选择方法进行有效组合,利用特征融合算法对词特征、内容特征、微博特征等基本特征进行了选择和融合,以获取更加有效的主客观分类特征。在新浪微博数据上的实验结果表明,该特征融合算法能够获得比最优单一特征选择方法更好的分类效果。
Abstract
To deal with issues in the existing micro-blog subjective and objective classification such as high redundancy in features and failure in employing the complementarity among the feature selection method, this study proposes a feature fusion approach to subjective and objective classification of micro-blog. In order to get more effective features, the study combines a variety of different feature selection methods, and uses the feature fusion algorithm to select and fuse the basic features including word features, content features, micro-blog features and so on. The experimental results using Sina micro-blog data show that the feature fusion algorithm can achieve better performance than the best single one.
关键词
微博 /
主客观分类 /
特征选择 /
融合算法
{{custom_keyword}} /
Key words
micro-blog /
subjective and objective classification /
feature selection /
fusion algorithm
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Jiang L,Yu M,Zhou M,et al. Target-dependent Twitter Sentiment Classification[C]//Proceedings of the AMACL,2011:151-160.
[2] Barbosa L,Feng J L. Robust Sentiment Detection on Twitter from Biased and Noisy[C]//Proceedings of the COLING,2010: 36-44.
[3] Hu M Q, Liu B. Opinion Extraction and Summarization on the Web[C]//Proceedings of the AAAI,2006:1621-1624.
[4] Yu H, Hatzivassiloglou V. Towards Answering Opinion Question:Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences[C]//Proceedings of the EMNLP, 2003: 129-136.
[5] Go A,Bhayani R, Huang L. Twitter Sentiment Classification Using Distant Supervision[R]. Technical report, Stanford Digital Library Technologies Project, 2009.
[6] Pak A,Paroubek P. Twitter as a Corpus for Sentiment Analysis and Opinion Mining[C]//Proceedings of LREC,2010: 1320-1326.
[7] Davidov D,Tsur O,Rappoport A. Enhanced Sentiment Learning Using Twitter Hashtags and Smileys[C]//Proceedings of the COLING,2010:241-249.
[8] 李寿山,黄居仁. 基于Stacking组合分类方法的中文情感分类研究[J].中文信息学报,2010,24(5):56-61.
[9] 张珊,于留宝,胡长军. 基于表情图片与情感词的中文微博情感分析[J].计算机科学,2012,39(z3): 146-148,176.
[10] 刘志明,刘鲁. 基于机器学习的中文微博情感分类实证研究[J].计算机工程与应用,2012,48(1):1-4.
[11] 谢丽星,周明,孙茂松. 基于层次结构的多策略中文微博情感分析和特征抽取[J].中文信息学报,2012,26(1):73-83.
[12] 姚天防,彭思崴. 汉语主客观文本分类方法的研究[C]//第三届全国信息检索与内容安全学术会议论文集,2007:117-123.
[13] Yang Y M,Pedersen J O. A Comparative Study on Feature Selection in Text Categorization[C]//Proceedings of the ICML,1997: 412-420.
[14] Boutsidis C, Mahoney M W,Drineas P. Unsupervised Feature Selection for Principal Components Analysis[C]//Proceedings of the KDD,2008:61-69.
[15] Shen Y,Li S C,Zheng L,et al. Emotion Mining Research on Micro-blog[C]//Proceedings of the SWS,2009: 71-75.
[16] Dong Z D, Dong Q. HowNet. http://www.keenage.com/, 2005.
[17] 徐琳宏,林鸿飞,潘宇,等.情感词汇本体的构造[J].情报学报, 2008, 27(2): 180-185.
[18] You L P, Liu K Y. Building Chinese FrameNet Database[C]//Proceedings of the IEEE NLP-KE,2005:301-306.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家语委“十二五”科研规划项目(YB125-19);国家自然科学基金(61373082);国家自然科学基金(60970053); 山西省回国留学人员科研资助项目(2013-015);国家“863”高技术研究发展计划基金(2006AA0lZ142)
{{custom_fund}}