长尾商品是指单种商品销量较低,但是由于种类繁多,形成的累计销售总量较大,能够增加企业盈利空间的商品。在电子商务网站中,用户信息量较少且购买长尾商品数量较少、数据稀疏,因此对用户购买长尾商品的行为预测具有一定的挑战性。该文提出预测用户购买长尾商品的比例,研究单一用户购买长尾商品的整体偏好程度。利用社交媒体网站上海量的文本信息和丰富的用户个人信息,提取用户的个人属性、文本语义、关注关系、活跃时间等多个种类的特征;采用改进的迭代回归树模型MART(Multiple Additive Regression Tree),对用户购买长尾商品的行为进行预测分析;分别选取京东商城和新浪微博作为电子商务网站和社交媒体网站,使用真实数据构建回归预测实验,得到了一些有意义的发现。该文从社交媒体网站抽取用户特征,对于预测用户购买长尾商品的行为给出一个新颖的思路,可以更好地理解用户个性化需求,挖掘长尾市场潜在的经济价值,改进电子商务网站的服务。
Abstract
Long-tail products, with low demands, occupy a significant share of total revenue in total. It is challenging to analyze the long-tail purchase behaviors due to the data sparsity resulted from few purchase behaviors. This paper proposes to leverage online social media information for predicting the long-tail purchase behaviors. In specific, we collect the user profiles form the social media information, including the status text, following links and temporal activity distributions, and predict their purchases by a weighted Multiple Additive Regression Trees (MART). Experimented on the data from JingDong and SinaWeibo, the effectiveness of the proposed method are revealed, together with several interesting findings.
关键词
长尾商品 /
电子商务 /
社交媒体 /
购买行为预测
{{custom_keyword}} /
Key words
long-tail products /
e-commerce shopping /
social media /
purchase prediction
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 克里斯·安德森. 长尾理论[M]. 北京:中信出版社, 2006. 12.
[2] Brynjolfsson E, Hu Y, Smith M D. Consumer Surplus in the Digital Economy:Estimating the Value of Increased Product Variety at Online Booksellers[J]. Working Papers, 2003, 49(11):1580-1596.
[3] Jansen B J, Chris Anderson. The Long Tail:Why the Future of Business is Selling Less or More. [J]. Information Processing & Management, 2007, 43(4):1147-1148.
[4] Ricci F, Rokach L, Shapira B. Introduction to Recommender Systems Handbook[M]. Springer US, 2011:1-4.
[5] Linden G, Smith B, York J. Amazon. com Recommendations:Item-to-Item Collaborative Filtering[J]. IEEE Internet Computing, 2003, 7(1):76-80.
[6] Yin, Hongzhi, Cui, Bin, Li, Jing, et al. Challenging the Long Tail Recommendation[J]. Proceedings of the Vldb Endowment, 2012, 5(9):896-907.
[7] Oestreichersinger G, Sundararajan A. Recommendation Networks and the Long Tail of Electronic Commerce[J]. Social Science Electronic Publishing, 2009, 36(1):65-84.
[8] Zhang Y, Pennacchiotti M. Predicting purchase behaviors from social media[C]//Proceedings of the 22nd International Conference on World Wide Web. 2013:1521-1532.
[9] 陈凯, 朱钰. 机器学习及其相关算法综述[J]. 统计与信息论坛, 2007, 22(5):105-112.
[10] S Ankit, S Bhanderi. Survey on Feature Engineering of Author-Paper Pair Matching in Bibliography Data[J]. International Journal of Computer Applications in Engineering Sciences, 2014, 6(2):035-039.
[11] Zhang H, Spoelstra J, Spoelstra J, et al. Committee based Prediction System for Recommendation[C]//Proceedings of the 17th International Conference on Kdd Cup, 2011:215-229.
[12] Jerome H. Friedman. Greedy Function Approximation:A Gradient Boosting Machine[J]. The Annals of Statistics, 2001, 29(5):1189-1232.
[13] Chen T, Li H, Yang Q, et al. General Functional Matrix Factorization Using Gradient Boosting[C]//Proceedings of the 31st International Conference on Machine Learning. 2014:436-444.
[14] Zhou K, Yang S H, Zha H. Functional Matrix Factorizations for Cold-start Recommendation[C]//Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2011:315-324.
[15] Yan R, Huang C, Tang J, et al. To Better Stand on the Shoulder of Giants[C]//Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries. ACM, 2012:51-60.
[16] Zhao X W, Guo Y, He Y, et al. We know what you want to buy:a demographic-based system for product recommendation on microblogs[C]//Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, 2014:1935-1944.
[17] Lin J, Sugiyama K, Kan M Y, et al. Addressing cold-start in app recommendation:latent user models constructed from twitter followers[C]//Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval. ACM, 2013:283-292.
[18] 朱郁筱, 吕琳媛. 推荐系统评价指标综述[J]. 电子科技大学学报, 2012, 41(2):163-175.
[19] Shardanand U. Social information filtering:algorithms for automating "word of mouth"[C]//Proceedings of the 13th Sigchi Conference on Human Factors in Computing Systems. ACM Press/Addison-Wesley Publishing Co. 1995:210-217.
[20] Balabanovic, Marko, Shoham, Yoav. Fab:content-based, collaborative recommendation[J]. Communications of the Acm, 1997, 40(3):66-72.
[21] STEEL, R. G. D, TORRIE, J. H. Principles and procedures of statistics. [M]. McGraw-Hill, 1960.
[22] Ellis D M, Draper N P, Smith H S. Applied Regression Analysis[J]. Biometrics, 1998, 17(1):83.
[23] Jing Geng, Min-Liang Huang, Ming-Wei Li, et al. Hybridization of seasonal chaotic cloud simulated annealing algorithm in a SVR-based load forecasting model[J]. Neurocomputing, 2015, 151:1362-1373.
[24] L. Breiman. Classification and regression trees[C]//Proceedings of the Chapman & Hall/ CRC, 1984.
[25] Mirjalili S, Mirjalili S M, Lewis A. Let a biogeography-based optimizer train your Multi-Layer Perceptron[J]. Information Sciences, 2014, 269(8):188-209.
[26] Annabi H, Mcgann S T. Social Media as the Missing Link:Connecting Communities of Practice to Business Strategy[J]. Journal of Organizational Computing & Electronic Commerce, 2013, 23(1-2):56-83.
[27] Ting Bai, Hongjian Dou, Wayne Xin Zhao, Dingyi Yang, Ji-Rong Wen. An Experimental Study of Text Representation Methods for Cross-Site Purchase Preference Prediction Using the Social Text Data. . Journal of Computer Science and Technology[J]. 2017, 32(4):828-842.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金青年科学基金(61502502);国家重点基础研究发展计划(2014CB340403);北京市自然科学基金(4162032);中国人民大学2016年度拔尖创新人才培育资助计划
{{custom_fund}}