不同于传统的词项间强独立性假设的词袋模型驱动的观点句识别方法,该文提出了一种新型的基于词项共现关系的图模型方法。该方法通过构建词项共现关系图模型,利用词项与词项之间的共现性和句法关系来描述词项在观点句和非观点句集合中的分布差异,同时采用基于入度的词项权重计算方法来计算词项特征值。上述研究在基准语料上进行实验,实验表明采用基于词项关系图模型方法后,中文观点句识别准确率相比目前基于词袋的方法得到显著提升。
Abstract
Different from the traditional term independence assumption-based bag-of-words model, we present a new word co-occurrence relationship-based graphic model. Our model describes the distribution difference among the terms within both subjective and non-subjective sentences sets via the term co-occurrence and syntactic information, also integrates an indegree-based term weighting calculation method. Evaluation on the benchmark dataset shows the importance of the term co-occurrence graphic model. It also shows that our model significantly outperforms the bag-of-words model currently in the subjective sentence identification field.
Key words word co-occurrence; graphic model; subjective sentence identification; feature value; supervised learning
关键词
词项共现 /
图模型 /
观点句识别 /
特征值 /
有监督学习
{{custom_keyword}} /
Key words
word co-occurrence /
graphic model /
subjective sentence identification /
feature value /
supervised learning
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Rousseau F, Vazirgiannis M. Graph-of-word and TW-IDF: new approach to ad hoc IR[C]//Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. ACM, 2013: 59-68.
[2] Mihalcea R, Tarau P. TextRank: Bringing order into texts[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2004, 4(4): 275.
[3] 洪欢, 王明文, 万剑怡, 等. 基于迭代方法的多层Markov网络信息检索模型[J]. 中文信息学报, 2013, 27(5): 122-128.
[4] Pang B, Lee L, Vaithyanathan S. Thumbs up?: sentiment classification using machine learning techniques[C]//Proceedings of the Associate Computational Linguistics 02 Conference on Empirical Methods in Natural Language Processing-Volume 10. Association for Computational Linguistics, 2002: 79-86.
[5] Dave K, Lawrence S, Pennock D M. Mining the peanut gallery: Opinion extraction and semantic classification of product reviews[C]//Proceedings of the 12th International Conference on World Wide Web. ACM, 2003: 519-528.
[6] Pang B, Lee L. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts[C]//Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2004: 271.
[7] 徐军, 丁宇新, 王晓龙. 使用机器学习方法进行新闻的情感自动分类[J]. 中文信息学报, 2007, 21(6): 95-100.
[8] 谢丽星, 周明, 孙茂松. 基于层次结构的多策略中文微博情感分析和特征抽取[J]. 中文信息学报, 2012, 26(1): 73-83.
[9] Prabowo R, Thelwall M. Sentiment analysis: A combined approach[J]. Journal of Informetrics, 2009, 3(2): 143-157.
[10] Qiu L, Zhang W, Hu C, et al. Selc: a self-supervised model for sentiment classification[C]//Proceedings of the 18th ACM Conference on Information and Knowledge Management. ACM, 2009: 929-936.
[11] 徐睿峰, 王亚伟, 徐军, 等. 基于多知识源融合和多分类器表决的中文观点分析[C]//第三届中文倾向性分析评测论文集, 2011: 77-87.
[12] 吕云云, 李旸, 王素格. 基于BootStrapping 的集成分类器的中文观点句识别方法[J]. 中文信息学报, 2013, 27(5): 84-92.
[13] Martineau J, Finin T. Delta TFIDF: An Improved Feature Space for Sentiment Analysis[C]//Proceedings of the 3rd International Conference on Weblogs and Social Media (ICWSM). 2009: 258-261.
[14] Paltoglou G, Thelwall M. A study of information retrieval weighting schemes for sentiment analysis[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2010: 1386-1395.
[15] Deng Z H, Luo K H, Yu H L. A study of supervised term weighting scheme for sentiment analysis[J]. Expert Systems with Applications, 2014, 41(7): 3506-3513.
[16] Riloff E, Wiebe J. Learning extraction patterns for subjective expressions[C]//Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2003: 105-112.
[17] Kim S M, Hovy E. Determining the sentiment of opinions[C]//Proceedings of the 20th International Conference on Computational Linguistics. Association for Computational Linguistics, 2004: 1367.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61272212,61163006,61203313,61365002,61402208)
{{custom_fund}}