Abstract:With the great development of e-commerce, the product review mining has recently received a lot of attention. In product reviews, people often use different words and phrases to describe the same product feature, which are necessary to be recognized as synonyms for effective opinion summary. In this paper, we first calculate the similarity of product features. Then the must-link and cannot-link constraints are exacted based on the analysis of product reviews. Finally, the constrained hierarchical clustering algorithm and the extracted constraints are applied to recognize product feature synonyms. Experiments on diverse real-life datasets show promising results.
[1] M Hu, B Liu. Mining and summarizing customer reviews[C], Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2004:168-177. [2] A M Popescu, O Etzioni. Extracting product features and opinions from review[C]//Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA :Association for Computational Linguistics, 2005:339-346. [3] B Liu, M Hu, J Cheng. Opinion Observer: Analyzing and Comparing Opinions on the Web[C]//Proceedings of WWW. Chiba,Japan:ACM Press, 2005:342-351. [4] C Scaffidi, K Bierhoff, E Chang. Red Opal: Product-Feature Scoring from Reviews[C]//Proceedings of 8th ACM Conference on Electronic Commerce. New York, USA:ACM Press,2007:182-191 . [5] 田久乐, 赵蔚. 基于同义词词林的词语相似度计算方法[J]. 吉林大学学报(信息科学版), 2010,28(6):602-608. [6] 刘群, 李素建. 基于《知网》的词汇语义相似度计算[J]. 中文计算语言学, 2002,7(2):59-76. [7] D Klein, S D Kamvar, C D Manning. From instance level constraints to space-level constraints: making the most of prior knowledge in data clustering[C]//Proceedings of the 19th International Conference on Machine Learning .San Francisco, USA:Morgan Kaufmann Publishers,2002:307-314. [8] G Qiu, B Liu, J Bu, et al. Chen. Expanding domain sentiment lexicon through double propagation. Proceedings of the 21st International Joint Conference on Artificial Intelligence. Pasadena, USA:AAAI Press,2009:1199-1204. [9] L Zhang, B Liu, S H Lim, et al. Extracting and ranking product features in opinion documents [C]//Proceedings of the 23rd International Conference on Computational Linguistics . Beijing, China : Association for Computational Linguistics,2010: 1462-1470. [10] Y Xi. Extracting Product Features from Chinese Product Reviews [J]. Journal of Multimedia, 2013,8(6):647-654. [11] L Lee.Measures of Distributional Similarity[C]//Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics. Maryland, USA: Association for Computational Linguistics,1999:25-32. [12] P D Turney. Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL[C]//Proceedings of the 12th European Conference on Machine Learning. Freiburg, Germany: Springer-Verlag,2001:491-502. [13] D Higgins. Which statistics reflect semantics? rethinking synonymy and word similarity[C]//Proceedings of the International Conference on Linguistic Evidence. Tübingen,Germany: Walter de Gruyter,2004:265-284. [14] G Carenini, R T Ng, E Zwart. Extracting Knowledge from Evaluative Text[C]//Proceedings of the 3rd International Conference on Knowledge Capture. New York, USA: ACM Press, 2005:11-18. [15] B Shi, K Chang. Mining Chinese Review[C]//Proceedings of the 6rd International Conference on Data Mining. Washington, USA:IEEE Computer Society, 2006:585-589. [16] H Guo, H Zhu, Z Guo, et al. Product feature categorization with multilevel latent semantic association[C]//Proceedings of the 18th ACM conference on Information and knowledge management. Hong Kong: ACM Press, 2009:1087-1096. [17] Z Zhai, B Liu, H Xu, et al. Grouping Product Features Using Semi-Supervised Learning with Soft-Constraints[C]//Proceedings of the 23rd International Conference on Computational Linguistics. Beijing, China:Association for Computational Linguistics, 2010:1272-1280. [18] 杨源,马云龙,林鸿飞. 评论挖掘中产品属性归类问题研究[J]. 中文信息学报. 2012, 26(3):104-108. [19] Y L Ma, H F Lin, S Jin. A Revised SimRank Approach for Query Expansion[C]//Proceedings of the 6th Asia Information Retrieval Societies Conference. Taipei: Springer-Verlag, 2010:564-575. [20] K Wagstaff, C Cardie. Clustering with Instance-level Constraints[C]//Proceedings of the Seventeenth International Conference on Machine Learning. Stanford, USA: AAAI Press,2000: 1103-1110. [21] K Wagstaff, C Cardie, S Rogers, et al. Constrained k-means Clustering with Background Knowledge [C]//Proceedings of the Eighteenth International Conference on Machine Learning. Williamstown, USA: AAAI Press,2001: 577-584. [22] S Basu, A Banerjee, R Mooney. Active Semi-supervision for Pairwise Constrained Clustering[C]//Proceedings of the SIAM International Conference on Data Mining. Lake Buena Vista, USA:SIAM,2004:333-344. [23] Z Zhai, B Liu, H Xu, et al. Constrained LDA for Grouping Product Features in Opinion Mining. Proceedings of the 15th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Shenzhen, China: Springer-Verlag,2011:448-459. [24] 同义词词林(扩展版),哈尔滨工业大学信息检索研究室:http://ir.hit.edu.cn/[EB/OL]. [25] 知网, 董振东:http://www.keenage.com/[EB/OL]. [26] G Jeh, J Widom. SimRank: A Measure of Structural-Context Similarity[C]//Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Canada: ACM Press, 2002:538-543.