杨 源,马云龙,林鸿飞. 评论挖掘中产品属性归类问题研究[J]. 中文信息学报, 2012, 26(3): 104-109.
YANG Yuan, MA Yunlong, LIN Hongfei. Clustering Product Features in Opinion Mining. , 2012, 26(3): 104-109.
评论挖掘中产品属性归类问题研究
杨 源,马云龙,林鸿飞
大连理工大学 信息检索研究室,辽宁 大连, 116024
Clustering Product Features in Opinion Mining
YANG Yuan, MA Yunlong, LIN Hongfei
Information Retrieval Laboratory, Dalian University of Technology, Dalian, Liaoning 116024, China
Abstract:This paper focuses on clustering different feature expressions in product reviews into proper groups. In product reviews, the same features may have different expressions, e.g. “appearance” and “design” of a mobile phone actuallyindicate the same feature. Considering the fact that different expressions are always used with same sentimental words in a sentence, this paper first extracts product feature expressions and sentimental words in pairs to build a bipartite graph, and then adopts the Weight Normalized SimRank to compute similarity between different feature expressions in the bipartite graph, and finally optimizes the Bayesian classifier in Semi-Supervised Learning via the similarity. Experimental results show that the proposed method is valid. Key wordsproduct features; group; SimRank; semi-supervised learning
[1] Carenini G., R. Ng, E. Zwart. Extracting knowledge from evaluative text[C]//Proceedings of International Conference on Knowledge Capture, Banff, Canada, 2005: 8-15. [2] Lee L. Measures of distributional similarity[C]//Proceedings of ACL. Maryland, USA, 1999: 25-32. [3] Guo H., H. Zhu, Z. Guo, et al. Product feature categorization with multilevel latent semantic association[C]//Proceedings of CIKM. Hong Kong, 2009: 1087-1096.[4] Michael Paul, Andrew Finch, Eiichiro Sumita. Integration of Multiple Bilingually-Learned Segmentation Schemes into Statistical Machine Translation[C]// Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR, 2010: 400-408. [5] Philipp Koehn, Franz Josef Och, Daniel Marcu. Statistical Phrase-based translation[C]// Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, 2003: 923-940. [6] John D. Lafferty, Andrew McCallum, Fernando C. N. Pereira. Conditional Random Field: Probabilistic models for segmenting and labeling sequence data[C]// Proceedings 18th International Conference on Machine Learning, 2001: 282-289. [7] Fuchun Peng, Fangfang Feng, Andrew McCallum. Chinese segmentation and new word detection using Conditional Random Fields[C]// Proceedings of the 20th international conference on Computational Linguistics, 2004: 562-568. [8] Jun-Sheng Zhou, Xin-Yu Dai, Rui-Yu Ni, et al.. A hybrid approach to Chinese word segmentation around CRFs[C]// Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, 2005: 196-199. [9] Franz Och. Minimum error rate training in statistical machine translation[C]// Proceedings of the 41st Annual Meeting of the Association for Computational, 2003. [10] Kishore Papineni, Salim Roukos, ToddWard, et al.. BLEU: a Method for Automatic Evaluation of Machine Translation[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2002: 311-318. [11] Nianwen Xue, Libin Shen. Chinese word segmentation as LMR tagging[C]// Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, 2003: 176-179. [12] 赵海, 揭春雨. 基于有效子串标注的中文分词[J].中文信息学报, 2007, 21(5):8-13. [13] Christopher Dyers, Smaranda Muresan, Philip Resnik. Generalizing word lattice translation[C]// Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, 2008: 1012-1020.