摘要
面对海量的在线评论,有用特征识别有助于消费者选择高质量的评论,为合理决策提供支持。该文基于信息采纳模型理论,在数码相机和手机数据集上提取了四类影响评论质量的有用特征集合,以logistic岭回归和基本decision tree模型作为基准模型,并结合递归特征消除(RFE)降维方法,比较检验了GBDT模型对评论质量分类和特征降维上的表现,揭示了各特征项对评论质量分类结果的“贡献度”,进而识别关键特征。实验结果表明,基于GBDT模型对评论质量分类效果较好,评论发表时间、评论者排名、关键特征数量、评论字数是影响评论质量的关键特征。
Abstract
Faced with hundreds of thousands of online reviews, helpful review features facilitate consumers to identify high quality reviews to support decision-making. Based on information adoption model, this paperexamines four kinds of useful features sets, totaling seventeen features, on the domains of camera and mobile. With baselines by the logisitc ridege regression and decision tree models, the paper investigates the GBDT model in review quality classification and features reduction, which reveals the feature contribution as the basis of key features identification. The experiment result shows that timeliness, reviewer ranking, key product features number, and review words number are key features influencing review quality, forming the optimized feature set for the GBDT model .
关键词
GBDT /
评论质量 /
特征贡献度 /
信息采纳模型 /
递归特征消除
{{custom_keyword}} /
Key words
GBDT /
review quality /
feature contribution /
information adoption model /
recursive feature elimination
{{custom_keyword}} /
王洪伟; 孟 园;.
在线评论质量有用特征识别: 基于GBDT特征贡献度方法. 中文信息学报. 2017, 31(3): 109-117
WANG Hongwei; MENG Yuan;.
Helpful Features Identification of Online Reviews Quality Based on
GBDT Feature Contribution. Journal of Chinese Information Processing. 2017, 31(3): 109-117
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(71371144;71601082)
{{custom_fund}}