协同过滤推荐算法通常基于物品或用户的相似度来实现个性化推荐,但是数据的稀疏性往往导致推荐精度不理想。大多数传统推荐算法仅考虑用户对物品的总体评分,而忽略了评论文本中用户对物品各个属性面的偏好。该文提出一种基于情感分析的推荐算法SACF(reviews sentiment analysis for collaborative filtering),该算法在经典的协同过滤推荐算法的基础上,考虑评论文本对相似度计算的影响。SACF算法利用LDA主题模型挖掘物品潜在的K个属性面,通过用户在各个属性面上的情感偏好计算用户相似度,从而构建推荐模型。基于京东网上评论数据集的实验结果表明,SACF算法不但可以有效地改善传统协同过滤推荐算法中数据稀疏性的问题,而且提高了推荐系统的精度。
Abstract
Collaborative filtering achieves personalized recommendation based on the similarity between items or users. However, the data sparseness affects the calculation of similarity, leading to a low recommendation accuracy. Most of the traditional recommendation algorithms only consider the rate matrix between users and items, while ignoring the item reviews generated by users, that offer valuable information about the users preferences to different attributes of the items. In this paper, we proposed a novel recommendation algorithm, called SACF (sentiment analysis collaborative filtering), which considers the impact of the review texts on the prediction of final score of items. By incorporating LDA topic model, SACF can extract K latent attribute aspects of the items and compute the user similarity according to the sentiment tendency in such attribute aspects. Our experimental results on Jingdong review dataset demonstrate that, the proposed method can not only alleviates the problem of data sparseness in collaborative filtering scheme, but also improves the recommendation accuracy.
关键词
推荐系统 /
协同过滤 /
LDA /
情感分析
{{custom_keyword}} /
Key words
recommender system /
filtering recommendation /
LDA /
sentiment analysis
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Pu P, Chen L, Hu R. A user-centric evaluation framework for recommender system[C] //Proceedings of the 5th ACM Conference Recommender System. New York: ACM Press, 2011: 157-164.
[2] Knijnenburg B P, Willemsen M C, Gantner Z, et al. Explaining the user experience of recommender system[J]. User Modeling and User-Adapted Interaction, 2012, 22(4): 441-504.
[3] Sarwar B M, Karypis G, Konstan J A,et al. Analysis of recommendation algorithms for ecommerce[C] //Proceedings of the 2nd ACM Conference on Electronic Commerce. New York: ACM Press, 2000: 158-167.
[4] Sarwar B M, Karypis G, Konstan J, et al. Item-based collaborative filtering recommendation algorithms[C] //Proceeding of the 10th International Conference on World Wide Web. New York: ACM Press, 2001: 285-295.
[5] Mooney R J, Roy L. Content-based book recommending using learning for text categorization[C] //Proceedings of the ACM international conference on digital libraries. New York: ACM Press, 2000: 195-204.
[6] 邓爱林,朱扬勇,施伯乐.基于项目评分预测的协同过滤推荐算法[J].软件学报, 2003, 14(09): 1621-1628.
[7] 王明文,陶红亮,熊小勇.双向聚类迭代的协同过滤推荐算法[J].中文信息学报, 2007, 22(4): 61-65.
[8] Chang T M, Hsiao W F. LDA-based personalized document recommendation[C]// Proceedings of the PACIS 2013, Paper 13.
[9] 廉涛,马军,王帅强等. LDA-CF: 一种混合协同过滤方法[J].中文信息学报,2014,28( 2) : 129-135.
[10] C W ki Leung, S C fai Chan, F lai Chung. Integrating collaborative filtering and sentiment analysis: A rating inference approach// Proceedings of the ECAI-Workshop on Recommender Systems, 2006,62-66.
[11] Hofmann T, Puzicha J. Latent class models for collaborative filtering[C] //Proceedings of the 16th IJCAI, 1999: 688-693.
[12] Ganu G, Kakodkar Y. Improving the quality of predictions using textual information in online user reviews[J]. Information Systems 38(1), 1-15,2013.
[13] Moraes R, Valiati J F, Gaviao Neto W P. Document-level sentiment classification an empirical comparison between SVM and ANN[J]. Expert System with Applications, 2013, 40(2): 621-633.
[14] Sayeedunnissa S F, Hussain A R, Hameed M A. Supervised Opinion Mining of Social Network Data Using a Bag-of-Words Approach on the Cloud[C] //Proceedings of 7th International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA 2012). India: Springer, 2013: 299-309.
[15] 黄诗琳,郑小林,陈德人.针对产品命名实体识的半监督学习方法[J].北京邮电大学学报, 2013, 36(002): 20-23.
[16] Chen C C, Chen Z Y, Wu C Y.An unsupervised approach for person name bipolarization using principal component analysis[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(11): 1963-1976.
[17] Paltoglou G, Thelwall M. Twitter, MySpace, Digg: Unsupervised Sentiment Analysis in Social Media[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2012, 3(4): 66.
[18] Hu M, Liu B. Mining and summarizing customer reviews[C] //Proceedings of the 10th ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM Press, 2004: 168-177.
[19] Qiu G, Liu B, Bu J, et al. Opinion word expansion and target extraction through double propagation[C] //Computational Linguistics. 2011: 9-27.
[20] Wang Yuanhong, Liu Yang, Yu Xiaohui. Collaborative Filtering with Aspect-Based Opinion Mining: A Tensor Factorization Approach[C] //Proceedings of the IEEE International Conference on Data Mining. Piscataway NJ: IEEE, 2012: 1152-1157.
[21] Liu Hongyan, He Jun, Wang Tingting, et al. Combining user preferences and user opinions for accurate recommendation[J]. Electronic Commerce Research and Applications, 2013, 12(1): 14-23.
[22] 刘丽佳,郭剑毅,周兰江等. 基于LM算法的领域概念实体属性关系抽取[J].中文信息学报, 2014, 28(6): 216-222.
[23] Liu H, Yang H, Li W, et al. CRO: A system for online review structurization[C] //Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM Press, 2008: 1085-1088.
[24] 宋晓雷,王素格,李红霞等.基于概率潜在语义分析的词汇情感倾向判别[J].中文信息学报, 2011, 25(2): 89-93.
[25] M Taboada, J Brooke, M Tofiloski, et al. Lexicon-based methods for sentiment analysis[J]. Computational Linguistics, 2011, 37(2): 267-307.
[26] Che Wanxiang, Li Zhenghua, Liu Ting. LTP: A Chinise Language Technology Platform[C]//Proceedings of 23rd International Conference on Computational Linguistics: Demonstrations. New York: ACM, 2010: 13-16.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61472291, 61303115)
{{custom_fund}}