基于LambdaMART算法的微信公众号排序

渠北浚,白宇,蔡东风,陈建军

PDF(3563 KB)
PDF(3563 KB)
中文信息学报 ›› 2019, Vol. 33 ›› Issue (12) : 101-109.
信息检索与问答系统

基于LambdaMART算法的微信公众号排序

  • 渠北浚1,白宇1,蔡东风1,陈建军2
作者信息 +

Ranking WeChat Official Account Based on LambdaMART

  • QU Beijun1, BAI Yu1, CAI Dongfeng1, CHEN Jianjun2
Author information +
History +

摘要

随着移动应用的普及,微信公众号已经成为人们获取信息的重要来源之一。微信公众号排序是获取优质信息、节约信息管理成本的必要手段。现有的公众号排序方法主要是对总阅读数、总点赞数等量化指标进行人工经验赋权得到排序结果,忽略了文章内容对公众号选择的影响。该文在保留量化指标的基础上,提出了主题垂直性、发文稳定性、主题覆盖率和主题相关性等微信篇章排序特征,使用LambdaMART算法针对上述特征集合进行排序学习,并通过主成分分析进行特征选择优化。实验结果表明,在公众号排序方面,LambdaMART方法优于现有其他方法,相关实验也证明了基于微信篇章内容分析特征的有效性。

Abstract

The WeChat official account has become one of the important sources of information for people. The existing official account ranking method mainly heuristically weigh the indexes such as the total reading number and the total point number, ignoring the impact of the content of the article on the selection of the official account. In addition to these quantitative indicators, this paper proposes the WeChat text features such as topic verticality, post-text stability, topic coverage, and topic relevance. LambdaMART algorithm is applied, and feature selection is performed by principal component analysis. The experimental results show that the proposed method is superior to other existing methods.

关键词

微信公众号 / 排序学习 / LambdaMART / 主成分分析

Key words

WeChat official account / learning to rank / LambdaMART / principal component analysis

引用本文

导出引用
渠北浚,白宇,蔡东风,陈建军. 基于LambdaMART算法的微信公众号排序. 中文信息学报. 2019, 33(12): 101-109
QU Beijun, BAI Yu, CAI Dongfeng, CHEN Jianjun. Ranking WeChat Official Account Based on LambdaMART. Journal of Chinese Information Processing. 2019, 33(12): 101-109

参考文献

[1] 高小倩.2018微信数据报告[EB/OL]. https://36kr.com/p/5171392.html[2018-11-26].
[2] 罗曼.微信公众平台下的文字编辑特色与技巧[J].编
辑学刊,2017(3): 116-120.
[3] 冀芳,张夏恒.学术期刊微信公众号评价研究[J].科技与出版,2016(7): 78-81.
[4] 郭顺利,张向先,李中梅.高校图书馆微信公众平台传播影响力评价体系研究[J].图书情报工作,2016,60(4): 29-36.
[5] 李明德,高如.媒体微信公众号传播力评价研究——基于20个陕西媒体微信公众号的考察[J].情报杂志,2015,34(7): 141-147.
[6] 张艳萍.科技期刊的微信公众号运营模式研究——基于4种核心科技期刊的量化分析[J].中国科技期刊研究,2015,26(5): 524-531.
[7] 黄炜,黄建桥,胡悦,等.微信公众号的评价指标体系研究[J].现代情报,2018,38(3): 99-104.
[8] 吴中堂,刘建徽,唐振华.微信公众号信息传播的影响因素研究[J].情报杂志,2015,34(4): 122-126.
[9] 颜月明,赵捧未.一种微信公众号影响力的评估方法[J].情报杂志,2016,35(9): 141-145.
[10] 郎清平.清博大数据—微信传播指数WCI[EB/OL]. http://www.gsdata.cn/site/usage-1[2018-11-26].
[11] 徐达内.新榜-新榜指数NRI[EB/OL]. https://www.newrank.cn/public/about/reference.pdf[2018-11-26].
[12] 黄震华,张佳雯,田春岐,等.基于排序学习的推荐算法研究综述[J].软件学报,2016,27(3): 691-713.
[13] Liu Tieyan. Learning to rank for information retrieval[J].Acm Sigir Forum,2007,41(2): 58-62.
[14] Burges C J C.From ranknet to lambdarank to lambdamart: An overview[J].Learning,2010,11(81): 23-581.
[15] Schlkopf,B,Platt J,Hofmann T. Learning to rank with nonsmooth cost functions[C]//Proceedings of International Conference on Neural Information Processing Systems. MIT Press,2006.
[16] Friedman J H. Greedy function approximation: A Gradient boosting machine[J]. The Annals of Statistics,2001,29(5): 1189-1232.
[17] Robertson S E,Zaragoza H. The probabilistic relevance framework: BM25 and beyond[J]. Foundations and Trends in Information Retrieval,2009,3(4): 333-389.
[18] Van Dang. RankLib. [EB/OL]. https://sourceforge.net/p/lemur/wiki/RankLib/[2018-11-26].
[19] Hervé Abdi,Williams L J. Principal component analysis[J]. Wiley Interdisciplinary Reviews Computational Statistics,2010,2(4): 433-459.

基金

教育部人文社会科学研究青年基金(17YJCZH003);辽宁省自然科学基金(20170540696)
PDF(3563 KB)

758

Accesses

0

Citation

Detail

段落导航
相关文章

/