该文提出一种基于多层次语言特征的弱监督的情感分析方法, 先以少量情感词构成初始情感词典,用这些种子词汇作引导,根据评论文本在单词、短语及句子级别的语言特征结合上下文挖掘目标文本中潜在的具有情感倾向的词汇/短语。通过自训练不断扩充情感词典,最终得到一个具有领域特征的情感词典,并用所得到的情感词典对目标文本的情感倾向进行判断。与其他方法在同一数据上的结果相比,该方法以很小的词典规模取得了最高的F-score,并且得到的情感词含义明确。方法用于不同领域也取得了较高的精度,表明方法具有较好的领域适应性。
Abstract
In this paper, a weakly supervised sentiment analysis approach is proposed. A few words are collected to construct an initial sentiment lexicon. These seed words are used to mine potential sentimental words in the target text. In this process, linguistic features at multi-levels are explored and the role of the context is examined. The lexicon is expanded iteratively, and the final version is applied to classify the sentiment of a target document. Compared to results of previous studies on the same data, this approach achieves the best F-score while the constructed sentiment lexicon is rather small. The experimental results also show that this approach is robust when applied to a texts of different domains.
关键词
情感分析 /
多层次语言特征 /
弱监督算法 /
情感词典
{{custom_keyword}} /
Key words
sentiment analysis /
linguistic features /
weakly-supervised method /
sentiment lexicon
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1]Bo Pang, Lilian Lee. A sentiment education: Sentiment analysis using subjectivity summarization based on minimum cuts[C]//Proceedings of the 42nd Meeting of the Association for Computational Linguistics. 2004.
[2] H Yu, V Hatzivassiloglou. Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing. 2003.
[3] Wang S, Manning C D. Baselines and Bigrams: Simple, Good Sentiment and Topic Classification[C]//Proceedings of the 50th Meeting of the Association for Computational Linguistics. 2012: 90-94.
[4] 傅向华,刘国,郭岩岩,郭武彪.中文博客多方面话题情感分析研究[J].中文信息学报,2013,27(1): 47-56.
[5] 王志昊,王中卿,李寿山,李培峰. 不平衡情感分类中的特征选择方法研究[J]. 中文信息学报,2013,27(4): 113-118.
[6] 谢丽星,周明,孙茂松. 基于层次结构的多策略中文微博情感分析和特征抽取[J]. 中文信息学报,2012, 26(1):73-84.
[7] Turney P D.Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews[C]//Proceeding of Association for Computational Linguistics 40th Anniversary Meeting. 2002:1417-1424.
[8] Zagibalov T, J Carroll. Automatic Seed Word Selection for Unsupervised Sentiment Classification of Chinese Text[C]//Proceedings of Coling-08,2008:1073-1080.
[9] Zagibalov T, J Carroll. Unsupervised classification of sentiment and objectivity in Chinese text[C]//Proceedings of the 3rd International Joint Conference on Natural Language Processing (IJCNLP), Hyderabad, India, 2008:304-311.
[10] M Hu, B Liu. Mining Opinion Features in Customer Reviews[C]//Proceedings of the Association for the Advancement of Artificial Intelligence(AAAI), 2004:755-760.
[11] Ye Q, Lin B, Li Y J. Sentiment Classification for Chinese Reviews: A Comparison between SVM and Semantic Approaches[C]//Proceedings of the 4th International Conference on Machine Learning and Cybernetics ICMLC2005(IEEE). 2005,4(8):2341-2346.
[12] Ye Q, Shi W, Li Y J. Sentiment Classification for Movie Reviews in Chinese by Proved Semantic Oriented Approach[C]//Proceedings of the 39th Annual Hawaii International Conference on System Sciences. 2006.
[13] Li T, Zhang Y, Sindhwani V. A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge[C]//Proceedings of the joint conference of the annual meeting of the association for computational linguistics and the international joint conference on natural language processing of the asian federation of natural language processing (ACL-IJCNLP). 2009: 244-252.
[14] Melville P, Gryc W, Lawrence R D. Sentiment analysis of blogs by combining lexical knowledge with text classification[C]//Proceedings of the 15th ACM SIGKDD conference on knowledge discovery and data mining(KDD). 2009: 1275-1284.
[15] Qiu L, Zhang W, Hu C, et al. Selc: A self-supervised model for sentiment classification[C]//Proceeding of the 18th ACM conference on information and knowledge management(CIKM). 2009: 929-936.
[16] He Y, Zhou D. Self-training from labeled features for sentiment analysis[J]. Information Processing and Management, 2011, 47: 606-616.
[17] Rebecca Bruce, Janyce Wiebe. Recognizing Subjectivity: A Case Study in Manual Tagging[J]. Natural Language Engineering, 1999, 5(2):1-16.
[18] 刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61202132);教育部高等学校博士学科点专项基金(20103218120024);中央高校基本科研业务费专项资金(NS2012073)
{{custom_fund}}