推荐系统是一种克服信息过载的重要工具,其中最流行的方法是协同过滤。该文提出一种结合潜在因素模型和邻域方法的混合协同过滤方法LDA-CF。我们首先将评分矩阵转换成伪文档集合,使用LDA(Latent Dirichlet Allocation)主题模型发现用户和物品潜在因素向量;然后在低维潜在因素空间计算用户和物品相似度;最后采用邻域方法预测未知评分。在MovieLens 100k数据集上的实验表明: 在评分预测任务中,LDA-CF取得的MAE性能指标优于传统的邻域方法。因此,LDA可以有效地从评分矩阵中发现对计算相似度十分有用的用户和物品低维特征表示,在一定程度上缓解了数据稀疏问题。
Abstract
Recommender system is an important tool to overcome information overload, where the most popular approach is collaborative filtering. This paper presents a mixture model for collaborative filtering named LDA-CF, which combines latent factor models and neighborhood methods. Firstly we convert the ratings matrix into a collection of pseudo-documents and utilize the LDA topic model to identify user and item latent factor vectors. Then we compute user-item similarities in the low-dimensional latent factor space. Finally we employ the neighborhood methods to predict unobserved ratings. Experiments on MovieLens 100k dataset demonstrate that LDA-CF outperformed neighborhood methods on the task of rating prediction in terms of MAE.
关键词
推荐系统 /
协同过滤 /
主题模型
{{custom_keyword}} /
Key words
recommender systems /
collaborative filtering /
LDA
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Goldberg D, Nichols D, Oki B M, et al. Using collaborative filtering to weave an information tapestry[J]. Commun. ACM, 1992, 35(12): 61-70.
[2] Sarwar B, Karypis G, Konstan J, et al. Item-based collaborative filtering recommendation algorithms[C]//Proceedings of the 10th international conference on World Wide Web, 2001, 285-295.
[3] Herlocker J L, Konstan J A, Borchers A l, et al. An algorithmic framework for performing collaborative filtering[C]//Proceedings of the 22nd ACM SIGIR, 1999, 230-237.
[4] Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems[J]. Computer, 2009, 42(8): 30-37.
[5] Salakhutdinov R, Mnih A. Probabilistic matrix factorization[C]//Proceedings of the 20th NIPS, 2007.
[6] 周涛. 个性化推荐的十大挑战[J]. 中国计算机学会通讯, 2012, 8(7): 48-61.
[7] Jin R, Si L, Zhai C X. A study of mixture models for collaborative filtering[J]. Inf. Retr., 2006, 9(3): 357-382.
[8] Blei D M, Ng A Y, and Jordan M I. Latent dirichlet allocation[J]. J. Mach. Learn. Res., 2003, 3: 993-1022.
[9] Sarwar B M, Karypis G, Konstan J A, et al. Application of dimensionality reduction in recommender system—a case study[C]//Proceedings of WebKDD at the 6th ACM SIGKDD, 2000.
[10] Koren Y. Factorization meets the neighborhood: a multifaceted collaborative filtering model[C]//Proceedings of the 14th ACM SIGKDD, 2008, 426-434.
[11] Hofmann T, Puzicha J. Latent class models for collaborative filtering[C]//Proceedings of the 16th IJCAI, 1999, 688-693.
[12] Hofmann T. Latent semantic models for collaborative filtering[J]. TOIS, 2004, 22(1): 89-115.
[13] Marlin B. Modeling user rating profiles for collaborative filtering[C]//Proceedings of the 17th NIPS, 2003.
[14] Steyvers M, Griffiths T. Probabilistic topic models[M]. In Landauer T, McNamara D S, Dennis S, et al. (Eds.), Handbook of Latent Semantic Analysis. Hillsdale, NJ: Erlbaum. 2007.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61272240, 60970047, 61103151);教育部博士点基金(20110131110028);教育部人文社科基金(12YJC630211);山东省自然科学基金(ZR2012FM037);山东省优秀中青年科学家科研奖励基金(BS2012DX012, BS2012DX017);山东大学研究生自主创新基金(YZC12084)
{{custom_fund}}