跨领域倾向性分析相关技术研究

吴 琼1,2,谭松波1,张 刚1,段洣毅1,程学旗1

PDF(677 KB)
PDF(677 KB)
中文信息学报 ›› 2010, Vol. 24 ›› Issue (1) : 77-84.
综述

跨领域倾向性分析相关技术研究

  • 吴 琼1,2,谭松波1,张 刚1,段洣毅1,程学旗1
作者信息 +

Research on Cross-Domain Opinion Analysis

  • WU Qiong1,2, TAN Songbo1, ZHANG Gang1, DUAN Miyi1, CHENG Xueqi1
Author information +
History +

摘要

该文主要研究文本的倾向性分析问题,即判断文本中的论断是正面还是负面的。已有的研究表明,监督分类方法对倾向性分析很有效。但是,多数情况下,已有的标注数据与待判断倾向性的数据不属于同一个领域,此时监督分类算法的性能明显下降。为解决此问题,该文提出一个算法,将文本的情感倾向性与图排序算法结合起来进行跨领域倾向性分析,该算法在图排序算法基础上,利用训练域文本的准确标签与测试域文本的伪标签来迭代进行倾向性分析。得到迭代最终结果后,为充分利用其中倾向性判断较为准确的测试文本来提高整个测试集倾向性分析的精度,将这些较准确的测试文本作为“种子”,进一步通过EM算法迭代进行跨领域倾向性分析。实验结果表明,该文提出的方法能大幅度提高跨领域倾向性分析的精度。

Abstract

This paper focuses on the opinion analysis of documents, i.e. to determine the overall opinion (e.g., negative or positive) of a given document. Existing studies have shown that, the supervised classification approaches usually perform well in this task. However, in most cases, the performance decreases sharply when the model is transferred from the labeled data domain to a different target domain without labeled data. This raises the issue of cross-domain opinion analysis. In this paper, we propose an iterative algorithm which integrated the opinion orientations of the documents into the graph-ranking algorithm for cross-domain opinion analysis. We apply the graph-ranking algorithm using the accurate labels of old-domain documents as well as the “pseudo” labels of new-domain documents. Over the results of the iterative algorithm, we try to further improve the performance by choosing the test documents whose opinions have been determined more accurately as “seeds”, and applying the EM algorithm again for cross-domain opinion analysis. The experiment results indicate that the proposed algorithm could improve the performance of cross-domain opinion analysis dramatically.
Key wordscomputer application; Chinese information processing; cross domain; opinion analysis; graph ranking; EM algorithm

关键词

计算机应用 / 中文信息处理 / 跨领域 / 倾向性分析 / 图排序 / EM算法

Key words

computer application / Chinese information processing / cross domain / opinion analysis / graph ranking / EM algorithm
 
/   /   /
 
/   /   /
 
/   /  

引用本文

导出引用
吴 琼1,2,谭松波1,张 刚1,段洣毅1,程学旗1. 跨领域倾向性分析相关技术研究. 中文信息学报. 2010, 24(1): 77-84
WU Qiong1,2, TAN Songbo1, ZHANG Gang1, DUAN Miyi1, CHENG Xueqi1. Research on Cross-Domain Opinion Analysis. Journal of Chinese Information Processing. 2010, 24(1): 77-84

参考文献

[1] 胡熠,陆汝占,李学宁,等.基于语言建模的文本情感分类研究[J].计算机研究与发展,2007, 44(9): 1469-1475.
[2] 姚天昉,娄德成.汉语语句主题语义倾向分析方法的研究[J].中文信息学报,2007, 21(5): 73-79.
[3] 唐慧丰,谭松波,程学旗.监督学习方法在语气挖掘中的应用研究[J].中文信息学报,2007,21(6): 88-94.
[4] 赵军, 许洪波, 黄萱菁, 谭松波, 刘康, 张奇.中文倾向性分析评测技术报告[C]//第一届中文倾向性分析评测会议 (The First Chinese Opinion Analysis Evaluation). COAE, 2008.
[5] Weifu Du, Songbo Tan. An Iterative Reinforcement Approach for Fine-Grained Opinion Mining[C]//Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, Colorado, 2009: 486-493.
[6] Huifeng Tang, Songbo Tan and Xueqi Cheng. A Survey on Sentiment Detection of Reviews. Expert Systems With Applications[J]. Elsevier. 2009, 36(7): 10760-10773.
[7] Chang CC, Lin CJ. LIBSVM: a library for support vector machines. 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[8] Songbo Tan, Xueqi Cheng, Moustafa M. Ghanem, Bin Wang, Hongbo Xu. A Novel Refinement Approach for Text Categorization[C]//Proceedings of the 14th ACM international conference on Information and knowledge management. Bremen, Germany, 2005:469-476.
[9] Songbo Tan. An Effective Refinement Strategy for KNN Text Classifier. Expert Systems With Applications[J]. Elsevier. 2006, 30(2): 290-298.
[10] Tan S. B. Neighbor-weighted K-nearest neighbor for unbalanced text corpus[J]. Expert Systems with Applications. 2005, 28: 667-671.
[11] John Blitzer, Mark Dredze, Fernando Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification[C]//Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Prague. 2007:440-447.
[12] Songbo Tan, Xueqi Cheng, Yuefen Wang and Hongbo Xu. Adapting Naive Bayes to Domain Adaptation for Sentiment Analysis[C]//31st European Conference on Information Retrieval. Springer Berlin: Heidelberg, 2009: 337-349.
[13] Songbo Tan, Gaowei Wu, Huifeng Tang and Xueqi Cheng. A Novel Scheme for Domain-transfer Problem in the context of Sentiment Analysis[C]//Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. Lisbon, Portugal, 2007:979-982.
[14] S. Brin, L. Page, R. Motwami, and T. Winograd, The PageRank Citation Ranking: Bringing Order to the Web[R]. Stanford, CA: Computer Science Department, Stanford University, 1999.
[15] Turney, P.D.Mining the Web for synonyms: PMI-IR versus LSA on TOEFL[C]//Proceedings of the Twelfth European Conference on Machine Learning. Berlin: Springer-Verlag., 2001:491-502.
[16] Vasileios Hatzivassiloglou, Kathleen R. McKeown. Predicting the Semantic Orientation of Adjectives[C]//Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and the 8th Conference of the European Chapter of the ACL. New Brunswick: NJ, 1997: 174-181.
[17] Turney Peter. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphila, USA: ACL, 2002: 417-424.
[18] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques[C]//Proceedings of EMNLP. Philadelphia, USA: ACL, 2002:79-86.
[19] H. Cui, V. Mittal, and M. Datar. 2006. Comparative experiments on sentiment classification for online product reviews[C]//AAAI. Boston, USA, 2006:1265-1270.
[20] Daume III, H. and Marcu, D. Domain adaptation for statistical classifiers[J]. Journal of Artificial Intelligence Research, 2006, 26: 101-126.
[21] Dikan Xing, Wenyuan Dai, Gui-Rong Xue, and Yong Yu. Bridged refinement for transfer learning[C]//PKDD. Warsaw, Porland, 2007:324-335.
[22] Jing Jiang, ChengXiang Zhai. A Two-Stage Approach to Domain Adaptation for Statistical Classifiers[C]//CIKM. Lisbon, Portugal, 2007:401-410.
[23] Despster, A. P., Laird, N. M., Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm[J]. Royal Stat Soc. B. 1977,39(1): 1-38.
[24] 王治敏,朱学锋,俞士汶.基于现代汉语语法信息词典的词语情感评价研究[J].Computational Linguistics and Chinese Language Processing, 2005, 10(4): 581-592.

基金

国家自然科学基金资助项目(60803085,60933005);国家863高技术研究发展计划基金资助项目(2006AA010105-02,2007AA01Z416,2007AA01Z441);国家重点基础研究发展计划(973)资助项目(2007CB311100)
PDF(677 KB)

586

Accesses

0

Citation

Detail

段落导航
相关文章

/