情感分类是目前篇章情感分析的主要方法,但该方法存在难以融入中文结构特征的问题。针对此问题,采用级联模型对篇章情感倾向进行分析,将篇章情感倾向分析分为两层 小句级和篇章级,对篇章情感倾向分析引入小句级的情感分析。该文使用最大熵模型处理小句级情感分类,小句级的输出作为上层篇章级的输入,并结合句型特征和句子位置等信息作为特征,采用支持向量机模型进行篇章级情感分类。同时对于级联模型中双层标注问题,基于交叉验证的思想提出了单层标注级联模型,避免了多层标注工作以及错误。实验结果表明,该方法的准确率较传统情感分类方法提高了2.53%。
Abstract
Classification is the main method to analyze the document sentiment polarity, but it is defected in its deficiency in integrating the structure features. A cascaded model for sentiment polarity analysis is proposed to address this issue, which consists of two levelsthe clause level and the document level. The document is first segmented into clauses which are classified into positive and negative categories by an Maximum Entropy model. Afterwards, these categories are combined with types and positions of clauses as features for document classification via the Support Vector Machine model. Meanwhile, a Single-label Cascade Model based on cross-validation is proposed. Experimental results prove that the accuracy of the proposed method is improved by 2.53 compared with traditional methods of sentiment classification.
Key wordssentiment analysis, sentiment classification; cascade model; ME; SVM
关键词
情感倾向分析 /
情感分类 /
级联模型 /
最大熵 /
支持向量机
{{custom_keyword}} /
Key words
sentiment analysis, sentiment classification /
cascade model /
ME /
SVM
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Bo Pang, Lillian Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts [C]//Proceedings of the ACL, Barcelona, Spain: 2004, 201-208.
[2] Mike Thelwall, David Wilkinson, Sukhvinder Uppal. Data mining emotion in social network communication: Gender differences in MySpace [J]. Journal of the American Society for Information Science and Technology,2010, 1(64): 190-199.
[3] 许洪波,姚天昉,黄萱菁.第二届中文倾向性分析评测技术报告[C]//第二届中文倾向性分析评测.上海: 2009, 1-23.
[4] 董喜双,关毅,李本阳.基于最大熵模型的中文词与句情感分析研究[C]//第二届中文倾向性分析评测.上海: 2009, 50-58.
[5] Bo Pang, Lillian Lee, Shivakumar Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques[C]//Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language processing. 2002: 79-86.
[6] 唐慧丰,谭松波,程学旗.基于监督学习的中文情感分类技术比较研究[J].中文信息学报,2007, 21 (6): 88-94.
[7] 徐军,丁宇新,王晓龙.使用机器学习方法进行新闻的情感自动分类[J].中文信息学报,2007,21(6): 95-100.
[8] Shotaro Matsumoto, Hiroya Takamura, Manabu Okumura. Sentiment Classification Using Word Sub- sequences and Dependency Sub-trees[C]//Proceedings of PAKDD. 2005: 301-311.
[9] 陈锦禾,范新,沈闻,等.基于情感词识别的BBS情感分类研究[J].计算机技术与发展, 2009, 7(19): 120-123.
[10] Peter Turney. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of the ACL. 2002: 417-424.
[11] Ryan McDonald, Kerry Hannan, Tyler Neylon, et al. Structured Models for Fine-to-Coarse Sentiment Analysis[C]//Proceedings of the ACL. 2007: 432-439.
[12] 李钝,曹付元,曹元大,等.基于短语模式的文本情感分类研究[J].计算机科学, 2008, 4: 132-134.
[13] 邢福义.小句中枢说[J].中国语文,1995, 6.
[14] 黄忠廉.小句中枢全译说[J].汉语学报, 2005, 2.
[15] Michael Collins. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms[C]//Proceedings of EMNLP. Philadelphia, PA: 2002: 1-8.
[16] T. Jaynes. Information Theory and Statistical Mechanics [J]. Physics Reviews. 1957, 106: 620-630.
[17] C Cortes, V Vapnik. Support vector networks [J]. Machine Learning, 1995, 20: 273-297.
[18] 阎威武,邵惠鹤.支持向量机分类器在医疗诊断中的应用研究[J].计算机仿真,2003, 20(2): 69-70.
[19] 王根,赵军.基于多重冗余标记CRFs的句子情感分析研究[J]. 中文信息学报,2007,21(5): 51-55,86.
[20] 刘康,赵军.基于层叠CRFs模型的句子褒贬度分析研究[J].中文信息学报,2008,22(1): 123-128.
[21] 王素格.基于Web的评论文本情感分类问题研究[D].上海: 上海大学,2008.
[22] 王国胜.支持向量机的理论与算法研究[D].北京: 北京邮电大学,2008.
[23] Jiang Wenbin, Huang Liang, Liu Qun, et al. A cascaded linear model for joint chinese word segmentation and part-of-speech tagging[C]//Proceedings of the ACL. 2008: 897-904.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60975077,90924015)
{{custom_fund}}