该文针对网络评论倾向分级问题,提出了一种基于观点袋模型和语言学规则的多级情感分类方法。通过分析句子中的词性搭配关系,设计了12种抽取特征-观点搭配模式,并对存在问题给出了解决策略。依据汉语用词特点和词汇在汽车领域的特殊用法,提出搭配四元组的情感倾向极性值计算方法。在此基础上,利用获取的搭配四元组及其情感倾向极性,建立文本的向量化表示,并构造了权重计算公式。最后,利用文本余弦相似度计算方法实现对评论文本的五级情感极性分类。通过在COAE2012任务3的汽车数据集上进行的测试,取得了较好的分类结果。
Abstract
Focused on the online review sentiment polarity classification problem, a multi-level sentiment classification method is proposed based on bag-of-opinion model and a set of linguistic rules. According to the part-of-speech of each word in the sentences, 12 patterns are designed for the feature-opinion pairs extraction, which enable to represent the whole text in a series of four-tuple of “feature, degree word, opinion word, negation word”. After designing the estimation of the sentiment priority of the four-tuple, the cosine similarity is further adopted for a 5-level sentiment polarity classification. Experiments on the dataset from COAE2012 Task 3 car dataset indicate a good result compared to the performances of the other runs in COAE.
关键词
情感分类 /
观点袋模型 /
词性搭配
{{custom_keyword}} /
Key words
sentiment classification /
bag of opinion /
POS collocation
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Wiebe J, Bruce R, Bell M, et al. A corpus study of evaluative and speculative language[C]//Proceedings of the 2nd ACL SIGdial Workshop on Discourse and Dialogue. USA: ACL, 2001: 1-10.
[2] Xia Y Q, Xu R F, Wong K F, et al. The unified collocation framework for opinion mining[C]//Proceedings of Machine Learning and Cybernetics, 2007 International Conference on. IEEE, 2007, 2: 844-850.
[3] 王素格. 基于 Web 的评论文本情感分类问题研究 [D]. 上海: 上海大学, 2008.
[4] Smadja F. Retrieving collocations from text: Xtract[J]. Computational linguistics, 1993, 19(1): 143-177.
[5] 王素格, 杨军玲, 张武. 自动获取汉语词语搭配[J]. 中文信息学报, 2006, 20(6): 31-37.
[6] Qu L, Ifrim G, Weikum G. The bag-of-opinions method for review rating prediction from sparse text patterns[C]//Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 2010: 913-921.
[7] Thet T T, Na J C, Khoo C S G. Aspect-based sentiment analysis of movie reviews on discussion boards[J]. Journal of Information Science, 2010, 36(6): 823-848.
[8] 王素格, 杨安娜. 基于混合语言信息的词语搭配倾向判别方法[J]. 中文信息学报, 2010, 24(3): 69-74.
[9] 刘康, 王素格, 廖祥文,等. 第四届中文倾向性分析评测总体报告[C]//Proceedings of the COAE2012, Nanchang, China,2012:1-33.
[10] 唐都钰, 石秋慧,王沛,等. HITIRSYS:COAE2012情感分析系统[C]//Proceedings of the COAE2012, Nanchang, China, 2012: 44-52.
[11] 林莉媛, 苏艳,戴敏,等. Suda_SAM_OMS情感倾向性分析技术报告[C]//Proceedings of the COAE2012, Nanchang, China, 2012:69-76.
[12] 程南昌, 侯敏,腾永林,等. 基于文本特征的语篇倾向性分析研究[C]//Proceedings of the COAE2012, Nanchang, China, 2012: 89-94.
[13] 刘楠, 贺飞艳,彭敏,等. 基于情感要素的否定句极性判别方法[C]//Proceedings of the COAE2012, Nanchang, China, 2012: 123-131.
[14] 魏现辉, 任巨伟,何文泽,等. DUTIR COAE2012评测报告[C]//Proceedings of the COAE2012, Nanchang, China,2012: 34-43.
[15] 崔安颀, 张永锋,刘奕群,等. 基于情感词典的中文倾向性分析[C]//Proceedings of the COAE2012, Nanchang, China,2012: 118-122.
[16] 计算所汉语词法分析系统ICTCLAS. http://www.ictclas.cn/.
[17] 王素格, 尹学倩, 李茹, 等. 基于非完备信息系统的评价对象情感聚类[J]. 中文信息学报, 2012, 26(4): 98-102.
[18] 宁鸿彬, 徐同. 新颁《标点符号用法》通释[M]. 教育科学出版社, 1992.
[19] 徐琳宏, 林鸿飞, 潘宇,等. 情感词汇本体的构造[J]. 情报学报, 2008, 27(2): 180-185.
[20] 顾正甲, 姚天昉. 评价对象及其倾向性的抽取和判别[J]. 中文信息学报, 2012, 26(4): 91-97.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61175067,61272095); 山西省自然科学基金(2010011021-1,2013011066-4); 山西省科技攻关项目(20110321027-02);山西省留学基金(2013-014)
{{custom_fund}}