|
|
The Construction of an Emotion Annotated Corpus on Microblog Text |
YAO Yuanlin1, WANG Shuwei1, XU Ruifeng1, LIU Bin1, GUI Lin1, LU Qin2, WANG Xiaolong1 |
1. Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, Guangdong 518055; 2. Department of Computing, The Hong Kong Polytechnic University, Kowloon, HongKong |
|
|
Abstract The research on text emotion analysis has made substantial progesses in recent years. However, the emotion annotated corpus is less developed, especially the ones on micro-blog text. To support the analysis on the emotion expression in Chinese micro-blog text and the evaluation of the emotion classification algorithms, an emotion annotated corpus on Chinese micro-blog text is designed and constructed. Based on the observation and analysis on the emotion expression in micro-blog text, a set of emotion annotation specification is developed. Following this specification, the emotion annotation on micro-blog level is firstly performed. The annotated information includes whether the micro-blog text has emotion expression and the emotion categories corresponding to the micro-blog with emotion expressions. Next, the sentence-level annotation is conducted. Meanwhile, the annotation on whether the sentence has emotion expression and the emotion categories, the strength corresponding to each emotion category is annotated. Currently, this emotion annotated corpus consists of 14000 micro-blogs, totaling 45431 sentences. This corpus was used as the standard resource in the NLP&CC2013 Chinese micro-blog emotion analysis evaluation, facilitating the research on emotion analysis to a great extent.
|
Received: 25 June 2014
|
|
|
|
|
[1] Mishne G. Experiments with mood classification in blog posts [C]//Proceedings of ACM SIGIR 2005 Workshop on Stylistic Analysis of Text for Information Access. 2005
[2] Ptaszynski M, Rzepka R, Araki K, et al. Automatically annotating a five-billion-word corpus of Japanese blogs for sentiment and affect analysis [J]. Computer Speech & Language, 2014, 28(1): 38-55. [3] Quan C, Ren F. Construction of a blog emotion corpus for Chinese emotional expression analysis [C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2009: 1446-1454. [4] 徐琳宏, 林鸿飞, 赵晶. 情感语料库的构建和分析 [J]. 中文信息学报, 2008, 22(1): 116-122. [5] Xu R.F, Xia Y.Q.; Wong K. F. and Li W.J. Opinion Annotation in On-line Chinese Product Reviews [C]//Proceedings of Language Resource and Evaluation Conference 2008. [6] Pak A. and Paroubek P. Twitter as a Corpus for Sentiment Analysis and Opinion Mining [C]//Proceedings of Language Resource and Evaluation Conference 2010: 1320-1326 . [7] 徐琳宏, 林鸿飞, 潘宇,等. 情感词汇本体的构造[J]. 情报学报, 2008, 27(2): 180-185. [8] 徐睿峰, 邹承天, 郑燕珍,等. 一种基于情绪表达与情绪认知分离的新型情绪词典[J]. 中文信息学报, 2013, 27(6): 82-90. [9] 贺飞燕, 何炎祥, 刘楠,等.面向微博短文本的细粒度情感特征抽取方法 [J].北京大学学报, 2014, 50(1): 48-54. [10] 张晶, 朱波, 梁琳琳,等.基于情绪因子的中文微博情绪识别与分类 [J] .北京大学学报, 2014, 50(1): 79-84. [11] 欧阳纯萍,阳小华,雷龙艳,多策略中文微博细粒度情绪分析研究 [J].北京大学学报, 2014, 50(1): 67-72. |
|
|
|