中文句子评价对象抽取是指在中文句子中抽取评论所针对的对象或对象的属性。目前国内相关研究工作尚未能有效识别复合词评价对象和未登陆评价对象。针对以上两种情况,该文提出了一种基于层叠条件随机场的中文句子评价对象抽取方法。该方法首先通过低层条件随机场获得候选评价对象集,然后通过降噪模型对噪声进行过滤、补充模型对缺失的候选评价对象进行补充、合并模型对复合短语候选评价对象进行合并,最后由高层模型抽取出评价对象。实验结果显示,与基于线性链条件随机场的识别方法相比,该方法准确率、召回率和F1值分别提升1.62%、5.75%和4.17%,能有效地识别复合词评价对象和未登录评价对象,从而提高中文句子评价对象的识别精度。
Abstract
Sentiment-objects extraction aims to identify the targets of opinion described in sentiment sentences. However, previous researches fail to extract compound targets and unknown words. In this paper, the cascaded CRFs model is presented to deal with the problem. The method first acquires opinion target set using lower-lever CRFs model. then, middle-lever models is employed to get candidate set by filtering noise, complementing missing candidate targets, and merging compound noun phrases. Finally, opinion targets set is extract from the higher-lever model using middle-lever model candidate set as input. Experiments show that our method outperforms linear chain CRFs by 1.62% in precision, 5.75% in recall, and 4.17% in F1 measure. Meanwhile, the method is also effective to identify the compound targets and unknown targets.
Key wordssentiment-objects; cascaded conditional random fields; noise reduction model; complement model
关键词
评价对象 /
层叠条件随机场 /
降噪模型 /
补充模型
{{custom_keyword}} /
Key words
sentiment-objects /
cascaded conditional random fields /
noise reduction model /
complement model
/
/
/
/
/
/
/
/
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] James R Cowie, Wendy G Lehnert. Information extraction[J]. Communications of the ACM, 1996, 39(1): 80-91.
[2] Fuchun Peng, Andrew McCallum. Information extraction from research papers using conditional random fields[J]. Information Processing and Management, 2006, 42(4): 963-979.
[3] Li Zhuang, Feng Jing, Xiao-Yan Zhu. Movie review mining and summarization[C]//Proceedings of the ACM 15th Conference on Information and Knowledge Management. Arlington, Virginia, USA, 2006: 43-50.
[4] Ruifeng Xu, Chunyu Kit. Incorporating Feature-based and Similarity-based Opinion Mining-CTL in NTCIR-8 MOAT[C]//Proceedings of NTCIR-8 Workshop Meeting. Tokyo, Japan, 2010: 276-281.
[5] Shanzong Zhu, Yuanchao Liu, Ming Liu, et al. Research on Feature Extraction from Chinese Text for Opinion Mining[C]//Processing of 2009 International Conference on Asian Languages. Singapore, 2009: 7-10.
[6] Minqing Hu, Bing Liu. Mining Opinion Features in Customer Reviews[C]//Proceedings of 19th National Conference on Artificial Intelligence (AAAI-2004). California, USA, 2004: 755-760.
[7] Minqing Hu, Bing Liu. Mining and summarizing customer reviews[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle, Washington, USA, 2004: 168-177.
[8] 刘鸿宇,赵妍妍,秦兵,等. 评价对象抽取及其倾向性分析[J]. 中文信息学报,2010,24(1): 84-88.
[9] Bin Lu. Identifying Opinion Holders and Targets with Dependency Parser in Chinese News Texts[C]//Proceedings of the NAACL HLT 2010 Student Research Workshop. Los Angeles, California, USA, 2010: 46-51.
[10] Tengfei Ma, Xiaojun Wan. Opinion Target Extraction in Chinese News Comments[C]//Proceedings of the 23rd International Conference on Computational Linguistics: Poster Volume. Beijing, China, 2010: 782-790.
[11] Soo-Min Kim, Eduard Hovy. Extracting opinions, opinion holders, and topics expressed in online news media text[C]//Proceedings of the ACL Workshop on Sentiment and Subjectivity in Text. Sydney, Australia, 2006: 1-8.
[12] Wei Jin, Hung Hay Ho, Rohini K Srihari. OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extraction[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris, France, 2009: 1195-1204.
[13] 宋晓雷,王素格,李红霞. 面向特定领域的产品评价对象自动识别研究[J]. 中文信息学报,2010,24(1): 89-93.
[14] Guang Qiu, Bing Liu, Jiajun Bu, et al. Opinion Word Expansion and Target Extraction through Double Propagation[J]. Computational Linguistics, 2011, 37(1): 9-27.
[15] Soo-Min Kim, Eduard Hovy. Identifying Opinion Holders for Question Answering in Opinion Texts[C]//Proceedings of AAAI-05 Workshop on Question Answering in Restricted Domains. Pennsylvania, USA, 2005.
[16] Gamgarn Somprasertsri, Pattarachai Lalitrojwong. Automatic Product Feature Extraction from Online Product Reviews Using Maximum Entropy with Lexical and Syntactic Features[C]//Processing of The 2008 IEEE International Conference on Information Reuse and Integration. Las Vegas, Nevada, USA, 2008: 250-255.
[17] Gamgarn Somprasertsri, Pattarachai Lalitrojwong. A Maximum Entropy Model for Product Feature Extraction in Online Customer Reviews[C]//Processing of IEEE International Conference on Cybernetics and Intelligent Systems(CIS 2008). Chengdu, China, 2008: 575-580.
[18] 章剑锋,张奇,吴立德,等. 中文观点挖掘中的主观性关系抽取[J]. 中文信息学报,2008,22(2): 55-59.
[19] Yun-Qing Xia, Bo-Yi Hao, Liu-Ling Dai. Term Extraction from Web Reviews with Opinion Heuristics[C]//Proceedings of the Eighth International Conference on Machine Learning and Cybernetics. Baoding, China, 2009: 3516-3521.
[20] John D Lafferty, Andrew McCallum, Fernando C N Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning. Williamstown, MA, USA, 2001: 282-289.
[21] Niklas Jakob, Iryna Gurevych. Extracting Opinion Targets in a Single- and Cross-Domain Setting with Conditional Random Fields[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Vancouver, British Columbia, Canada, 2010: 1035-1045.
[22] 徐冰,王山雨.句子级文本倾向性分析评测报告[C]//第二届中文倾向性分析评测会议(COAE2009) 论文集.北京: 第二届中文倾向性分析评测委员会,2009: 69-73.
[23] 徐冰,赵铁军,王山雨,等. 基于浅层句法特征的评价对象抽取研究[J]. 自动化学报,2011,37(10): 1241-1247.
[24] 王中卿,王荣洋,庞磊等. Suda_SAM_OMS情感倾向性分析技术报告[C]//第三届中文倾向性分析评测会议(COAE2011) 论文集.北京: 第三届中文倾向性分析评测委员会,2011: 25-32.
[25] 张莉,钱玲飞,许鑫. 基于核心句及句法关系的评价对象抽取[J]. 中文信息学报,2011,25(3): 23-29.
[26] Shengchun Ding, Ting Jiang. Comment Target Extraction Based on Conditional Random Field & Domain Ontology[C]//Processing of 2010 International Conference on Asian Language. Harbin, Heilongjiang, China, 2010: 189-192.
[27] 刘康,赵军. 基于层叠CRFs模型的句子褒贬度分析研究[J]. 中文信息学报,2008,22(1): 123-128.
[28] 周俊生,戴新宇,尹存燕,等. 基于层叠条件随机场模型的中文机构名自动识别[J]. 电子学报,2006,34(5): 804-809.
[29] 杨晓东,晏立,尤慧丽. CCRF与规则相结合的中文机构名识别[J]. 计算机工程,2011,37(8): 169-174.
[30] 郭剑毅,薛征山,余正涛,等.基于层叠条件随机场的旅游领域命名实体识别[J]. 中文信息学报,2009,23(5): 47-52.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
福建省自然科学基金资助项目(2010J05133);福建省科技创新平台计划资助项目(2009J1007);福州大学科技发展基金资助项目(2010-XQ-22)
{{custom_fund}}