面向微博文本的情绪标注语料库构建

姚源林,王树伟,徐睿峰,刘 滨,桂 林,陆 勤,王晓龙

PDF(1028 KB)
PDF(1028 KB)
中文信息学报 ›› 2014, Vol. 28 ›› Issue (5) : 83-91.
语言资源建设

面向微博文本的情绪标注语料库构建

  • 姚源林1,王树伟1,徐睿峰1,刘 滨1,桂 林1,陆 勤2,王晓龙1
作者信息 +

The Construction of an Emotion Annotated Corpus on Microblog Text

  • YAO Yuanlin1, WANG Shuwei1, XU Ruifeng1, LIU Bin1, GUI Lin1, LU Qin2, WANG Xiaolong1
Author information +
History +

摘要

文本情绪分析研究近年来发展迅速,但相关的中文情绪语料库,特别是面向微博文本的语料库构建尚不完善。为了对微博文本情绪表达特点进行分析以及对情绪分析算法性能进行评估,该文在对微博文本情绪表达特点进行深入观察和分析的基础上,设计了一套完整的情绪标注规范。遵循这一规范,首先对微博文本进行了微博级情绪标注,对微博是否包含情绪及有情绪微博所包含的情绪类别进行多标签标注。而后,对微博中的句子进行有无情绪及情绪类别进行标注,并标注了各情绪类别对应的强度。目前,已完成14000条微博,45431句子的情绪标注语料库构建。应用该语料库组织了NLP&CC2013中文微博情绪分析评测,有力地促进了微博情绪分析相关研究。

Abstract

The research on text emotion analysis has made substantial progesses in recent years. However, the emotion annotated corpus is less developed, especially the ones on micro-blog text. To support the analysis on the emotion expression in Chinese micro-blog text and the evaluation of the emotion classification algorithms, an emotion annotated corpus on Chinese micro-blog text is designed and constructed. Based on the observation and analysis on the emotion expression in micro-blog text, a set of emotion annotation specification is developed. Following this specification, the emotion annotation on micro-blog level is firstly performed. The annotated information includes whether the micro-blog text has emotion expression and the emotion categories corresponding to the micro-blog with emotion expressions. Next, the sentence-level annotation is conducted. Meanwhile, the annotation on whether the sentence has emotion expression and the emotion categories, the strength corresponding to each emotion category is annotated. Currently, this emotion annotated corpus consists of 14000 micro-blogs, totaling 45431 sentences. This corpus was used as the standard resource in the NLP&CC2013 Chinese micro-blog emotion analysis evaluation, facilitating the research on emotion analysis to a great extent.

关键词

情绪语料库 / 语料库构建 / 情绪标注 / 微博文本

Key words

emotion corpus / corpus construction / emotion annotation / micro-blog text

引用本文

导出引用
姚源林,王树伟,徐睿峰,刘 滨,桂 林,陆 勤,王晓龙. 面向微博文本的情绪标注语料库构建. 中文信息学报. 2014, 28(5): 83-91
YAO Yuanlin, WANG Shuwei, XU Ruifeng, LIU Bin, GUI Lin, LU Qin, WANG Xiaolong. The Construction of an Emotion Annotated Corpus on Microblog Text. Journal of Chinese Information Processing. 2014, 28(5): 83-91

参考文献

[1] Mishne G. Experiments with mood classification in blog posts [C]//Proceedings of ACM SIGIR 2005 Workshop on Stylistic Analysis of Text for Information Access. 2005

[2] Ptaszynski M, Rzepka R, Araki K, et al. Automatically annotating a five-billion-word corpus of Japanese blogs for sentiment and affect analysis [J]. Computer Speech & Language, 2014, 28(1): 38-55.
[3] Quan C, Ren F. Construction of a blog emotion corpus for Chinese emotional expression analysis [C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2009: 1446-1454.
[4] 徐琳宏, 林鸿飞, 赵晶. 情感语料库的构建和分析 [J]. 中文信息学报, 2008, 22(1): 116-122.
[5] Xu R.F, Xia Y.Q.; Wong K. F. and Li W.J. Opinion Annotation in On-line Chinese Product Reviews [C]//Proceedings of Language Resource and Evaluation Conference 2008.
[6] Pak A. and Paroubek P. Twitter as a Corpus for Sentiment Analysis and Opinion Mining [C]//Proceedings of Language Resource and Evaluation Conference 2010: 1320-1326 .
[7] 徐琳宏, 林鸿飞, 潘宇,等. 情感词汇本体的构造[J]. 情报学报, 2008, 27(2): 180-185.
[8] 徐睿峰, 邹承天, 郑燕珍,等. 一种基于情绪表达与情绪认知分离的新型情绪词典[J]. 中文信息学报, 2013, 27(6): 82-90.
[9] 贺飞燕, 何炎祥, 刘楠,等.面向微博短文本的细粒度情感特征抽取方法 [J].北京大学学报, 2014, 50(1): 48-54.
[10] 张晶, 朱波, 梁琳琳,等.基于情绪因子的中文微博情绪识别与分类 [J] .北京大学学报, 2014, 50(1): 79-84.
[11] 欧阳纯萍,阳小华,雷龙艳,多策略中文微博细粒度情绪分析研究 [J].北京大学学报, 2014, 50(1): 67-72.

基金

国家自然科学基金(61203378, 61300112, 61370165); 高等院校博士学科点专项基金(20122302120 070);广东省自然科学基金(S2012040007390, S2013010014475);模式识别国家重点实验室开放课题基金;深圳市基础研究计划(JCYJ20120613152557576, JC201005260118A);深圳市国际合作计划(GJHZ201206131 106 1217),百度高校合作项目
PDF(1028 KB)

3591

Accesses

0

Citation

Detail

段落导航
相关文章

/