卢奇,陈文亮. 大规模中文实体情感知识的自动获取[J]. 中文信息学报, 2018, 32(8): 32-41.
LU Qi, CHEN Wenliang. Automatically Building a Large Scale Dictionary of Chinese Entity Sentiment Expressions. , 2018, 32(8): 32-41.
Automatically Building a Large Scale Dictionary of Chinese Entity Sentiment Expressions
LU Qi1,2, CHEN Wenliang1,2
1.School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China; 2.Collaborative Innovation Center of Novel Software Technology and Industrialization, Suzhou, Jiangsu 215006, China
Abstract:Except for some sentiment dictionaries. There are not sentiment expressions for entities which are very important for analysis. This paper proposes a method of automatically building a dictionary of entity sentiment expressions from large-scale raw text. In our method, we use a sorting algorithm based on a bipartite graph to rank the candidates of sentiment expressions. Then, we present a refining algorithm according to semantic similarity to extract some expressions from the low-rank set. Finally, we conduct the experiments on three datasets from different domains. The experimental results show that the accuracy of the extracted expressions is better than 90%. Totally we obtain a large scale dictionary including about 300K sentiment expressions.
[1] Pang B, Lee L. Opinion mining and sentiment analysis[J]. Foundations and Trends in Information Retrieval, 2008, 2(1-2): 1-135. [2] Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques[C]//Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing-Volume 10. Association for Computational Linguistics, 2002: 79-86. [3] 宗成庆. 统计自然语言处理[M]. 北京: 清华大学出版社, 2008: 1-475. [4] Ku L W, Chen H H. Mining opinions from the Web: Beyond relevance retrieval[J]. Journal of the American Society for Information Science and Technology, 2007, 58(12): 1838-1850. [5] Li J, Sun M. Experimental study on sentiment classification of Chinese review using machine learning techniques[C]//Proceedings of International Conference on Natural Language Processing and Knowledge Engineering, 2007. Nlp-Ke. 2007: 1-12. [6] Xu G, Meng X, Wang H. Build Chinese emotion lexicons using a graph-based algorithm and multiple resources.[C]//Proceedings of COLING 2010, International Conference on Computational Linguistics, Proceedings of the Conference, 23-27 August 2010, Beijing, China. 2010: 1209-1217. [7] 刘知远, 崔安颀. 大数据智能[J]. 信息安全与通信保密, 2016, 2: 066. [8] Esuli A, Sebastiani F. SentiWordNet: A publicly available lexical resource for opinion mining[C]//Proceedings of LREC, 2006, 6: 417-422. [9] Baccianella S, Esuli A, Sebastiani F. SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining[C]//Proceedings of LREC, 2010, 10: 2200-2204. [10] Turney P D. Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2002: 417-424. [11] Banea C, Wiebe J M, Mihalcea R. A bootstrapping method for building subjectivity lexicons for languages with scarce resources[C]//Proceedings of International Conference on Language Resources and Evaluation, Lrec 2008. DBLP, 2009: 2764-2767. [12] Hatzivassiloglou V, McKeown K R. Predicting the semantic orientation of adjectives[C]//Proceedings of the Eighth Conference on European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 1997: 174-181. [13] Kanayama H, Nasukawa T. Fully automatic lexicon expansion for domainoriented sentiment analysis[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing, 2006: 355-363. [14] K Qiu G, Liu B, Bu J, et al. Expanding domain sentiment lexicon through double propagation[C]//Proceedings of International Jont Conference on Artifical Intelligence. Morgan Kaufmann Publishers Inc. 2009: 1199-1204. [15] Zhang L, Liu B, Lim S H, et al. Extracting and ranking product features in opinion documents[C]//Proceedings of International Conference on Computational Linguistics: Posters. Association for Computational Linguistics, 2010: 1462-1470. [16] K Qiu G, Liu B, Bu J, et al. Opinion word expansion and target extraction through double propagation[J]. Computational Linguistics, 2011, 37(1): 9-27. [17] Agathangelou P, Katakis I, Kokkoras F, et al. Mining domain-specific dictionaries of opinion words[C]//Proceedings of Web Information System Engineering, 2014: 47-62. [18] 李智超.面向互联网评论的情感资源构建及应用研究[D].北京: 清华大学博士学位论文,2011. [19] Popescu A M, Etzioni O. Extracting product features and opinions from reviews[M]. Natural Language Processing and Text Mining. Springer London, 2007: 9-28. [20] 刘鸿宇, 赵妍妍, 秦兵, 等. 评价对象抽取及其倾向性分析[J]. 中文信息学报, 2010, 24(1): 84-88. [21] Zhuang L, Jing F, Zhu X Y. Movie review mining and summarization[C]//Proceedings of the 15th ACM International Conference on Information and Knowledge Management. ACM, 2006: 43-50. [22] Kobayashi N, Inui K, Matsumoto Y, et al. Collecting evaluative expressions for opinion extraction[C]//Proceedings of International Conference on Natural Language Processing. Berlin Heidelberg: Springer, 2004: 596-605. [23] Somprasertsri G, Lalitrojwong P. Mining feature-opinion in online customer reviews for opinion summarization[J]. J. UCS, 2010, 16(6): 938-955. [24] 王素格, 吴苏红. 基于依存关系的旅游景点评论的特征-观点对抽取[J]. 中文信息学报, 2012, 26(3): 116-122. [25] Page L, Brin S, Motwani R, et al. The PageRank citation ranking: Bringing order to the web[J]. Stanford Digital Libraries Working Paper, 1999: 9(1): 1-14. [26] Zhang R, Zettsu K, Kidawara Y, et al. Context-sensitive web service discovery over the bipartite graph model[J]. Frontiers of Computer Science, 2013, 7(6): 875-893. [27] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv: 1301.3781, 2013.