消费意图是指用户在文本中明确表达出的购买产品或服务等一些商业消费的意愿,如“想买一部手机”。该文针对微博上的消息文本,提出一种基于用户自然标注的微博消费意图识别方法。该方法将微博消费意图识别看作为领域自适应学习问题,通过自动获取的训练语料基于源域和目标域共同特征设计分类器,抽取置信度高的伪标注消费意图微博,再利用微博特征训练新的分类器对微博进行消费意图识别。实验结果表明该文所采用的方法是有效的,F值达到69%和77%,其中使用的各种特征对于提高消费意图识别的效果皆有帮助。
Abstract
Consumption Intent refers to an exact indication of an immediate or future purchase in microblog. For example, a post like “I want to buy a mobile phone” indicates a buying intention. The paper proposes to study the problem of identifying consumption intent in microblogs based on user naturally annotated resources. Specifically, the proposed method recasts consumption intent recognition as a domain adaptation problem, and presents an approach utilizing automatic acquisition of large text corpora for classification. First, we look for a set of common features generalizable across domain adaptation, and then we extract the high confidence of pseudo annotation samples. Finally, we pick up useful features specific to the target domain. Experimental results show that the proposed method is effective for consumption intent recognition, achieving 69% and 77% in F-value, respectively. And, the features adopted are all contributive to the performance.
关键词
消费意图 /
自然标注 /
社会媒体 /
领域自适应
{{custom_keyword}} /
Key words
consumption intent /
naturally annotated /
social media /
domain adaptation
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Dai H K, Zhao L,Nie Z, et al. Detecting online commercial intention (OCI)[C]//Proceedings of the 15th international conference on World Wide Web. ACM, 2006: 829-837.
[2] Ashkan A, Clarke C L A. Term-based commercial intent analysis[C]//Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2009: 800-801.
[3] 陈磊, 刘奕群, 茹立云, 等. 基于用户日志挖掘的搜索引擎广告效果分析[J]. 中文信息学报, 2008, 22(6): 92-97.
[4] Jansen B J. The comparative effectiveness of sponsored andnonsponsored links for Web e-commerce queries[J]. ACM Transactions on the Web (TWEB), 2007, 1(1): 3.
[5] Goldberg A B, Fillmore N, Andrzejewski D, et al. May All Your Wishes ComeTrue: A Study of Wishes and How to Recognize Them[C]//Proceedings of HumanLanguage Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 2009: 263-271.
[6] Z Chen, B Liu, M Hsu, et al. Identifying intention posts in discussion forums[C]//Proceedings of the HLT-NAACL, 2013. 1041-1050.
[7] Yang H, Li Y. Identifying user needs from social media[R]. IBM Tech Report. goo.gl/2XB7NY, 2013.
[8] Fu B, LIU T. Weakly-supervised consumption intent detection in microblogs[J]. Journal of Computational Information Systems, 2013, 6(9): 2423-2431.
[9] 庄福振,罗平,何清,史忠植.迁移学习研究进展[J].软件学报,2015,26(1): 26-39.
[10] Jiang J,Zhai C X. A two-stage approach to domain adaptation for statistical classifiers[C]//Proceedings of the sixteenth ACM conference on Conference on information and knowledge management.ACM, 2007: 401-410.
[11] 孙茂松. 基于互联网自然标注资源的自然语言处理[J]. 中文信息学报, 2011, 25(6): 26-32.
[12] Collobert R, Weston J. A unified architecture for natural language processing: Deep neural networks with multitask learning[C]//Proceedings of the 25th international conference on Machine learning. ACM, 2008: 160-167.
[13] Mnih A, Hinton G E. A scalable hierarchical distributed language model[C]//Advances in neural information processing systems. 2009: 1081-1088.
[14] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv: 1301.3781, 2013.
[15] Q Liu, Y Wang, J Li, et al. Predicting user likes in online media based on conceptualized social network profiles.//Web Technologies and Applications.Springer, 2014: 82-92.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家青年科学基金(61202277);国家自然科学基金(61170144,61472107)
{{custom_fund}}