宋佳颖,贺 宇,付国宏. 领域相关的汉语情感词典扩展[J]. 中文信息学报, 2015, 29(6): 75-82.
SONG Jiaying, HE Yu, FU Guohong. Automatic Expansion of Domain-Specific Sentiment Lexicon for Chinese. , 2015, 29(6): 75-82.
领域相关的汉语情感词典扩展
宋佳颖,贺 宇,付国宏
黑龙江大学 计算机科学技术学院,黑龙江 哈尔滨 150080
Automatic Expansion of Domain-Specific Sentiment Lexicon for Chinese
SONG Jiaying, HE Yu, FU Guohong
School of Computer Science and Technology, Heilongjiang University, Harbin, Heilongjiang 150080, China)
Abstract:In this paper we incorporate opinion element normalization with the PolarityRank algorithm and thus propose a semi-supervised approach to Chinese domain-specific sentiment lexicon expansion. We first extract a set of attribution-evaluation pairs from product reviews. In order to reduce the complexity and noises in sentiment lexicon expansion, we exploit Jaccard coefficient and rules to normalize the extracted product attributions and their relevant evaluations, respectively. Finally, we modify the PolarityRank algorithm to automatically recognize domain-specific dynamic polar words that are out of the original sentiment lexicon. Experimental results over product reviews in car and mobile-phone domains show that using the expanded domain-specific dynamic polar words helps improve polarity classification performance. Key words sentiment analysis; sentiment lexicon expansion; polarityRank; opinion element normalization
[1] H Kanayama, T Nasukawa. Fully automatic lexicon expansion for domain-oriented sentiment analysis[C]//Proceedings of EMNLP06, 2006: 355-363. [2] G Qiu, B Liu, J Bu, C Chen. Opinion word expansion and target extraction through double propagation[J]. Computational Linguistics, 2011, 37(1): 9-27. [3] A Esuli, F Sebastiani. Determining the semantic orientation of terms through gloss classification[C]//Proceedings of the CIKM05, 2005: 617-624. [4] 王荣洋, 鞠久鹏, 李寿山, 周国栋. 基于CRFs的评价对象抽取特征研究[J]. 中文信息学报, 2012, 26(2) : 56-61. [5] T Wilson, J Wiebe, P Hoffmann. Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis[J]. Computational Linguistics, 2009, 35(3): 399-434. [6] Y Wu, M Wen. Disambiguating dynamic sentiment ambiguous adjectives[C]//Proceedings of COLING10, 2010: 1191-1199. [7] A Andreevskaia, S Bergler. When specialists and generalists work together: Overcoming domain dependence in sentiment tagging[C]//Proceedings of ACL 08, 2008:290-298. [8] S Tan, G Wu, H Tang and X Cheng. A novel scheme for domain-transfer problem in the context of sentiment analysis[C]//Proceedings of CIKM 07, 2007:979-982. [9] 吕韶华,杨亮,林鸿飞. 基于SimRank的跨领域情感倾向性分析算法研究[J].中文信息学报, 2012, 26(6): 38-44. [10] A Ismail, S Manandhar. Bilingual lexicon extraction from comparable corpora using in-domain terms[C]//Proceedings of COLING10, 2010: 481-489. [11] M Hu, B Liu. Mining opinion features in customer reviews[C]//Proceedings of AAAI04, 2004: 755-760. [12] 傅向华, 刘国, 郭岩岩, 郭武彪. 中文博客多方面话题情感分析研究[J]. 中文信息学报, 2013, 27(1) : 47-55. [13] J Lafferty, A McCallum, F Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of ICML01, 2001: 282-289. [14] L Zhang, F Jing, X Zhu. Movie review mining and summarization[C]//Proceedings of CIKM06. 2006: 43-50. [15] 刘鸿宇, 赵妍妍, 秦兵, 刘挺. 评价对象抽取及其倾向性分析[J]. 中文信息学报, 2010, 24(1) : 84-88. [16] N Jakob, I Gurevych. Using anaphora resolution to improve opinion target identification in movie reviews[C]//Proceedings of ACL10, 2010: 263-268. [17] B Wang, H Wang. Bootstrapping both product features and opinion words from Chinese customer reviews with cross-inducing[C]//Proceedings of IJCNLP08, 2008: 289-295. [18] 李寿山, 李逸薇, 黄居仁, 苏艳. 基于双语信息和标签传播算法的中文情感词典构建方法[J]. 中文信息学报, 2013, 27(6) : 75-81. [19] B Klebanov, N Madnani, J Burstein. Using Pivot-Based Paraphrasing and Sentiment Profiles to Improve a Subjectivity Lexicon for Essay Data[J]. TACL, 2013, 1: 99-110. [20] H Yu, Z Deng, S Li. Identifying Sentiment Words Using an Optimization-based Model without Seed Words[C]//Proceedings of ACL13. 2013: 855-859. [21] F Cruz, J Troyano, F Ortega, et al. Automatic expansion of feature-level opinion lexicons[C]//Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis, 2011: 125-131. [22] L Page, S Brin, R Motwani, et al. The PageRank citation ranking: bringing order to the web[J]. 1999-66: Stanford Digital Library Technologies Project. [23] G Fu, C Kit, J Webster. Chinese word segmentation as morpheme-based lexical chunking[J]. Information Sciences, 2008, 178(9): 2282-2296.宋佳颖(1990—),硕士研究生,主要研究领域为自然语言处理。 E-mail: jy_song@outlook.com贺宇(1988—),硕士研究生,主要研究领域为自然语言处理、意见摘要。 E-mail: heyucs@yahoo.com付国宏(1968—),通信作者,教授,主要研究领域为自然语言处理、文本挖掘。 E-mail: ghfu@hotmail.com