面向中文文本的情感信息抽取语料库构建

戴 敏,朱 珠,李寿山,周国栋

PDF(2229 KB)
PDF(2229 KB)
中文信息学报 ›› 2015, Vol. 29 ›› Issue (4) : 67-73.
情感分析与社会计算

面向中文文本的情感信息抽取语料库构建

  • 戴 敏,朱 珠,李寿山,周国栋
作者信息 +

Corpus Construction on Opinion Information Extraction in Chinese

  • DAI Min, ZHU Zhu, LI Shoushan, ZHOU Guodong
Author information +
History +

摘要

情感信息抽取是情感分析中的一个重要子任务。虽然该任务已经开展有一段时间,但是面向中文文本的情感信息抽取任务研究才刚刚起步。目前中文文本的情感信息抽取面临的首要困难在于现有的相关中文语料库还非常有限。为了更好开展中文文本的情感信息抽取研究,该文重点研究了中文语料标注体系,构建一个规模较大、标注类型丰富的中文情感信息抽取语料库。除了常见语料库标注的情感倾向性、评价对象、情感词等信息外,重点标注了评价对象的省略、无情感词情感句表达及极性转移等情况。由语料信息统计可知,该文所指出的特殊现象(例如,评价对象的省略)在中文情感表达中是非常普遍的,开展这方面的研究很有必要。该文所构建的中文文本语料库将为中文情感信息抽取任务提供语料基础。

Abstract

Opinion information extraction (OIE) is an important sub-task in the research on sentiment analysis. Currently, one pressing issue in Chinese OIE is that the Chinese corpus is not readily avalable. This paper focuses on the annotation framework for Chinese OIE, and constrcuts a Chinese corpus containing rich information. Specifically, in additions to the popular elements including sentiment orientation, opinion target and opinion keyword, our corpus contains the information of opinion target ellipsis, the expressing opinion without sentimental words and the sentimental polarity shifting. The statistics show the popularity and necessity of these special points (e.g., opinion target ellipsis) in Chinese texts.

关键词

情感分析 / 情感信息抽取 / 中文语料库

Key words

sentiment analysis / opinion information extraction / Chinese corpus

引用本文

导出引用
戴 敏,朱 珠,李寿山,周国栋. 面向中文文本的情感信息抽取语料库构建. 中文信息学报. 2015, 29(4): 67-73
DAI Min, ZHU Zhu, LI Shoushan, ZHOU Guodong. Corpus Construction on Opinion Information Extraction in Chinese. Journal of Chinese Information Processing. 2015, 29(4): 67-73

参考文献

[1]Pang B, Lee L. Opinion Mining and Sentiment Analysis[J]. Foundations and Trends in Information Retrieval, 2008, 2(1-2) :1-135.
[2] Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment Classification using Machine Learning Techniques[C]//Proceedings of EMNLP-02. 2002: 79-86.
[3] 宗成庆. 统计自然语言处理[M]. 北京: 清华大学出版社,2008:1-475.
[4] Kim S, Hovy E. Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text[C]//Proceedings of the ACL Workshop on Sentiment and Subjectivity in Text. 2006: 1-8.

[5] Ku L, Liu I, Lee C, et al. H. Sentence-Level Opinion Analysis by CopeOpi in NTCIR-7[C]//Proceedings of NTCIR-7 Workshop. 2008.
[6] Hu M, Liu B. Mining Opinion Features in Customer Reviews[C]//Proceedings of AAAI-2004. 2004: 755-760.
[7] Zhuang L, Jing F, Zhu X. Movie review mining and summarization[C]//Proceedings of CIKM-2006. 2006: 43-50.
[8] Li B, Zhou L, Feng S, et al. A Unified Graph Model for Sentence-based Opinion Retrieval[C]//Proceedings of ACL. 2010:1367-1375.
[9] Jakob N, Gurevych I. Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields[C]//Proceedings of EMNLP-2010. 2010: 1035-1045.
[10] 王荣洋,鞠久朋,李寿山,等. 基于CRFs的评价对象抽取特征研究. 中文信息学报[J],2012,26(2): 56-61.
[11] Li S, Wang R, Zhou G. Opinion Target Extraction using a Shallow Semantic Parsing Framework[C]//Proceedings of AAAI 2012. 2012:1671-1677.
[12] 赵军,许洪波,黄萱菁,等. 中文倾向性分析评测技术报告[C]//Proceeding of COAE-2008.
[13] 刘康,王素格,廖祥文,等. 第四届中文倾向性分析评测总体报告[C]//Proceeding of COAE-2012.
[14] 谭松波,王素格,廖祥文,等. 第五届中文倾向性分析评测总体报告[C]//Proceeding of COAE-2013.
[15] Toprak C., Jakob N., and Gurevych I. Sentence and Expression Level Annotation of Opinions in User-Generated Discourse[C]//Proceedings of ACL-2010. 2010: 575-584.
[16] Cohen. A coefficient of agreement for nominal scales[J]. Educational and Psychological Measurement, 1960:37-46.

基金

国家自然科学基金(61003155,60873150);模式识别国家重点实验室开发课题基金
PDF(2229 KB)

837

Accesses

0

Citation

Detail

段落导航
相关文章

/