大多数关于个性化信息检索的研究都是针对所有查询的,很少有研究试图回答哪些查询将受益于个性化信息检索。从大规模知识库中挖掘大量的语言学知识,用于预测查询的个性化潜力,这些知识包括概念词、歧义词、同义词等。使用语言学知识作为特征,预测查询的个性化潜力,可以减少查询日志的数据稀疏问题的影响。实验结果表明该方法的有效性和可行性。
Abstract
Most of the previous studies on the personalized search are generally designed for all queries, and few have tried to answer which queries can benefit from personalization. In this paper, we mine linguistic knowledge from the large-scale human knowledge base to predict query potential for personalization. The acquired linguistic knowledge includes conceptual terms, ambiguous terms and synonymous terms, which are adopted to design corresponding features for predictive models. The knowledge mined from Wikipedia alleviates the data sparseness of query logs. The experiment results indicate the effectiveness and feasibility of our approach.
Key wordsquery potential for personalization; linguistic knowledge; query logs
关键词
查询个性化潜力 /
语言学知识 /
查询日志
{{custom_keyword}} /
Key words
query potential for personalization /
linguistic knowledge /
query logs
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Croft B, Metzler D, Strohman T. Search engines: information retrieval in practice[M]. Addison-Wesley Publishing Company, USA, 2009.
[2] Shen X, Tan B, Zhai C. Implicit user modeling for personalized search[C]//Proceeding of the 14th ACM International Conference on Information and Knowledge Management. 2005: 824-831.
[3] Teevan J, Dumais S, Horvitz E. Personalizing search via automated analysis of interests and activities[C]//Proceeding of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2005: 449-456.
[4] Dou Z, Song R, Wen J. A large-scale evaluation and analysis of personalized search strategies[C]//Proceeding of the 16th International Conference on World Wide Web. 2007: 581-590.
[5] Teevan J, Dumais S, Liebling D. To personalize or not to personalize: modeling queries with variation in user intent[C]//Proceeding of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2008: 163-170.
[6] Teevan J, Dumais S, Horvitz E. Potential for personalization[J]. ACM Transactions on Computer-Human Interaction (TOCHI), 2010(17): 1-31.
[7] Fellbaum C. Wordnet: an electronic lexical database[M]. The MIT Press, 1998.
[8] Chirita P, Nejdl W, Paiu R, et al. Using odp metadata to personalize search[C]//Proceeding of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2005: 15-19.
[9] Haveliwala T. Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search[J]. IEEE Transactions on Knowledge and Data Engineering, 2003(15): 784-796.
[10] Cronen-Townsend S, Zhou Y, Croft W. Predicting query performance[C]//Proceeding of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2002: 299-306.
[11] He J, Larson M, De Rijke M. Using coherence-based measures to predict query difficulty[J]. Advances in Information Retrieval, 2008(4956): 689-694.
[12] Song R, Luo Z, Wen J, et al. Identifying ambiguous queries in web search[C]//Proceeding of the 16th International Conference on World Wide Web. 2007: 1169-1170.
[13] Broder A. A taxonomy of web search[C]//Proceedings of ACM SIGIR Forum. 2002: 3-10.
[14] Kang I, Kim G. Query type classification for web document retrieval[C]//Proceeding of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. 2003: 64-71.
[15] Lee U, Liu Z, Cho J. Automatic identification of user goals in web search[C]//Proceeding of the 14th International Conference on World Wide Web. 2005: 391-400.
[16] 王锦, 王会珍, 张俐. 基于维基百科类别的文本特征表示[J]. 中文信息学报, 2011,25(2): 27-31.
[17] 韩先培, 赵军. 基于 wikipedia 的语义元数据生成[J]. 中文信息学报, 2009,23(2): 108-114.
[18] 熊德意, 刘群, 林守勋. 融合丰富语言知识的汉语统计句法分析[J]. 中文信息学报, 2005,19(3): 61-66.
[19] 俞士汶, 段慧明, 朱学锋, 等. 综合型语言知识库的建设与利用[J]. 中文信息学报, 2004,18(5): 1-10.
[20] Pass G, Chowdhury A, Torgeson C. A picture of search[C]//Proceeding of the 1st international conference on scalable information systems, 2006.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金重点资助项目(60736044)
{{custom_fund}}