该文主要研究如何自动识别微博中用户对各品牌汽车进行评价的句子。针对微博中汽车宣传信息较多而由真正汽车用户发出的观点句所占比例很小的特点,该文提出了结合微博和汽车评论语料的基于SVM模型的分类方法。选取的特征包括词语、评价词个数、与评价对象有关的词语以及微博相关特征。实验表明,评价词特征和部分微博相关特征可有效提高分类器性能,使用微博和汽车评论两种语料进行训练的分类器性能要比仅使用微博语料的方法好。
Abstract
This paper investigates how to automatically recognize the customer opinions towards certain automobiles in microblogs. Since there are a lot of advertises and release information of cars in microblogs, customer-generated opinion sentences are sparse, this paper proposes a SVM classifier-based method to combine microblog data and car review data for training. The selected features include words, the number of opinion words, words that have certain relations with opinion targets, as well as microblog-related features such as emoticons and user type. Experiment results indicate that opinion words feature and some of the microblog-related features boost the performance of the classifier. In addition, the performance of the classifier that uses two kinds of data for training is better than the one that only uses microblog data.
关键词
微博 /
观点句识别 /
意见挖掘 /
SVM
{{custom_keyword}} /
Key words
microblog /
opinioned sentences recognition /
opinion mining /
SVM
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Bruce R F, Wiebe J M. Recognizing subjectivity: a case study in manual tagging[J]. Natural Language Engineering, 1999, 5(2): 187-205.
[2] Hatzivassiloglou V, Wiebe J M. Effects of adjective orientation and gradability on sentence subjectivity[C]//Proceedings of the 18th conference on Computational linguistics-Volume 1. Association for Computational Linguistics, 2000: 299-305.
[3] Wiebe J M. Learning subjective adjectives from corpora[C]//Proceedings of the National Conference on Artificial Intelligence. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 2000: 735-741.
[4] Riloff E, Wiebe J, Wilson T. Learning subjective nouns using extraction pattern bootstrapping[C]//Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4. Association for Computational Linguistics, 2003: 25-32.
[5] Wiebe J, Riloff E. Creating subjective and objective sentence classifiers from unannotated texts[J]. Computational Linguistics and Intelligent Text Processing, 2005: 486-497.
[6] 姚天昉, 彭思崴. 汉语主客观文本分类方法的研究[C]//第三届全国信息检索与内容安全学术会议论文集. 2007 年.
[7] 姚天昉,张鑫. 一种基于正例的汉语意见型主观性文本分类方法. 第十二届中国少数民族语言信息处理学术研讨会论文集. 拉萨, 2009年7月.
[8] 许洪波, 孙乐, 姚天昉. 第三届中文倾向性分析评测 (COAE2011) 总结报告[C]. 第三届中文倾向性分析评测会议, 山东, 2011.
[9] Barbosa L, Feng J. Robust sentiment detection on twitter from biased and noisy data[C]//Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics, 2010: 36-44.
[10] Davidov D, Tsur O, Rappoport A. Enhanced sentiment learning using twitter hashtags and smileys[C]//Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics, 2010: 241-249.
[11] Jiang L, Yu M, Zhou M, et al. Target-dependent twitter sentiment classification[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, 1: 151-160.
[12] Liu K L, Li W J, Guo M. Emoticon smoothed language models for twitter sentiment analysis[C]//Proceedings of the twenty-Sixth AAAI Conference on Artificial Intelligence. 2012.
[13] 谢丽星, 周明, 孙茂松. 基于层次结构的多策略中文微博情感分析和特征抽取[J]. 中文信息学报, 2012, 26(1): 73-83.
[14] Qiu G, Liu B, Bu J, et al. Expanding domain sentiment lexicon through double propagation[C]//Proceedings of the 21st international jont conference on Artifical intelligence. 2009: 1199-1204.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}