微博情感分析是对微博内容进行细粒度的挖掘,有着重要的研究价值。微博评价对象的抽取是微博情感分析研究的关键问题之一。为了提高中文微博评价对象抽取的准确率,该文在中文微博特征分析和微博评论本体构建研究的基础上,尝试从词、词性、情感词以及本体四个方面进行特征选择,采用CRFs模型对评价对象进行抽取。该文将提出的方法运用到COAE2014测评的Task5评价对象抽取任务中,宏平均准确率达到61.20%,在所有测评队伍中居第一。实验结果表明,将本体特征引入到CRFs模型中,能够有效地提高评价对象抽取的准确率。
Abstract
Fine-grained sentiment analysis of Microblogs is very important. The extraction of opinion targets from opinion sentence is the key issue to sentiment analysis of Microblogs. To improve the performance of opinion targets extraction, this paper proposes to select features from words, parts of speech, emotional words and ontology, based on the characteristics of Chinese microblog and the construction of microblogging comment ontology, and then uses CRFs model to evaluate object extraction. At last, we apply the proposed method to Task5 of COAE2014. The accuracy of the evaluation object extraction is 61.20 percent, ranking first in all the evaluation team. The experiment results show that it is possible to effectively improve the accuracy of the evaluation opinion targets extraction to introduce the ontology into CRFs Model.
关键词
CRFs模型 /
本体 /
特征选择 /
评价对象抽取 /
信息抽取
{{custom_keyword}} /
Key words
CRFs model /
ontology /
feature selection /
opinion targets extraction /
information extraction
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Zhuang Li,Jing Feng,Zhu Xiaoyan. Movie review mining and summarization[C]//Proceedings of the ACM15th Conference on Information and Knowledge Management. Arlington: ACM 2006:43-50.
[2] Xu Ruifeng,Kit Chunyu. Incorporating feature-based and similarity-based opinion mining-CTL in NTCIR-8 MOAT[C]//Proceeding of NTCIR-8 Workshop Meeting. Tokyo 2010:276-281.
[3] Hu Minqing,Liu Bing. Mining opinion features in customer reviews[C]//Proceedings of Nineteenth National Conference on Artificial Intelligence (AAAI-2004). 2004: 755-760.
[4] Hu Minqing,Liu Bing.Mining and summarizing customer reviews[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle: 2004:168-177.
[5] 刘鸿宇,赵妍妍,秦兵,等.评价对象抽取及其倾向性分析[J].中文信息学报,2010,24(1) : 84-88.
[6] Lu Bin. Identifying opinion holders and targets with dependency parser in Chinese news texts[C]//Proceedings of the NAACL HLT 2010 Student Research Workshop. Los Angeles 2010:46-51.
[7] Ma Tengfei,Wan Xiaojun. Opinion target extraction in Chinese news comments[C]//Proceedings of the 23rd International Conference on Computational Linguistics. COLING. 2010:782-790.
[8] Lafferty J,McCallum A,Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data [C]//Proceedings of the 18th International Conference on Machine Learning (ICML 2001). 2001:282-289.
[9] 郭剑毅,薛征山,余正涛,等. 基于层叠条件随机场的旅游领域命名实体识别[J].中文信息学报,2009,23(5):47-52.
[10] Niklas Jakob,Iryna Gurevych. Extracting opinion targets in a single-and cross-domain setting with conditional random fields[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing.Massachusetts EMNLP. 2010: 1035-1045.
[11] 徐冰,王山雨.句子级文本倾向性分析评测报告[C].第二届中文倾向性分析评测会议(COAE2009) 论文集. 2009: 69-73.
[12] 徐冰,赵铁军,王山雨,等.基于浅层句法特征的评价对象抽取研究[J].自动化学报,2011,37(10):1241-1247.
[13] 王中卿,王荣洋,庞磊,等. Suda_SAM_OMS情感倾向性分析技术报告[C].第三届中文倾向性分析评测会议(COAE2011) 论文集. 2011: 25-32.
[14] 张莉,钱玲飞,许鑫.基于核心句及句法关系的评价对象抽取[J].中文信息学报,2011,25(3): 23-29.
[15] 王荣洋,鞠久朋,李寿山,等. 基于CRFs的评价对象抽取特征研究 [J]. 中文信息学报,2011,26(2): 56-61.
[16] 郑敏洁,雷志城,廖祥文,等.中文句子评价对象抽取的特征分析研究[J]. 福州大学学报(自然科学版),2012,40(5): 584-590.
[17] 郑敏洁,雷志城,廖祥文,等. 基于层叠CRFs的中文句子评价对象抽取[J]. 中文信息学报,2013,27(3): 69-76.
[18] Gruber T R, Toward Principle for the Design of Ontologies Used for Knowledge Sharing [J]. International Journal of Human-Computer Studies. New York, 1955,43(5-6):907-928.
[19] 丁晟春,李岳盟,甘利人.基于顶层本体的领域本体综合构建方法研究[J]. 情报理论与实践,2007(2):236-240.
[20] Neng Wen,Shengchun Ding,Ting Jiang. Research on Ontology Building of Product Reviews in Chinese[C]. International Conference on Machine Learning and Cyberneticsv (ICMLC) 2011, ICMLC,2011:1943,1948, 2011.
[21] 丁晟春,文能,蒋婷,等.基于CRF模型的半监督学习迭代观点句识别研究[J]. 情报学报,2012(10):1071-1076.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(71303111,71103085,71403121);国家社会科学基金(15BTQ063,14AZD084);中央高校基本科研业计划(30916011330)
{{custom_fund}}