基于“VASE”特征词的网络查询分类研究

王俞霖,孙乐,李文波

PDF(463 KB)
PDF(463 KB)
中文信息学报 ›› 2009, Vol. 23 ›› Issue (3) : 39-45.
综述

基于“VASE”特征词的网络查询分类研究

  • 王俞霖1,2,孙乐1,李文波1
作者信息 +

Web Query Classification Based on “VASE” Characterizing Words

  • WANG Yulin1,2, SUN Le1, LI Wenbo1
Author information +
History +

摘要

网络查询分类对提高搜索引擎的搜索质量有重要的意义。该文通过对真实用户查询日志的分析和标注,发现四种特征词(称之为“VASE”特征词)对查询分类起决定性作用。我们提取特征词并构造了一个特征词倒排索引,用于对查询进行主题分类。在此基础之上,提出了基于网络扩展和加权特征词的方法改善分类的效果。实验结果显示,基于此分类方法的正确率和召回率分别达到78.2%和77.3%。

Abstract

Web query classification is of great significance in improving the performance of search engine. By analyzing and manually labeling real user query logs, we found that four kinds of words, as called “VASE” characterizing words, substantially characterizing the query category. We extracted such words and made an inverted index from them for the web queriy classification. We further propose a corresponding web extension and weighted characteristic words methods to improve the classification results. Experimental results show that the precision rate and recall rate reach 78.2% and 77.3% respectively, meeting the practical requirements.
Key words computer application; Chinese information processing; Web query classification; “VASE” characteristic words; Web extension; weighted words

关键词

计算机应用 / 中文信息处理 / 网络查询分类 / “VASE”特征词 / 网络扩展 / 加权特征词

Key words

computer application / Chinese information processing / Web query classification / “VASE” characteristic words / Web extension / weighted words

引用本文

导出引用
王俞霖,孙乐,李文波. 基于“VASE”特征词的网络查询分类研究. 中文信息学报. 2009, 23(3): 39-45
WANG Yulin, SUN Le, LI Wenbo. Web Query Classification Based on “VASE” Characterizing Words. Journal of Chinese Information Processing. 2009, 23(3): 39-45

参考文献

[1] Andrei Broder. A taxonomy of web search [C]//ACM SIGIRForum. 2002, 3-10.
[2] Daniel E. Rose, Danny Levinson. Understanding user goals in web search [C]//Proceedings of the 13th international conference on World Wide Web. 2004, 13-19.
[3] Uichin Lee, Zhenyu Liu, Junghoo Cho. Automatic identification of user goals in Web search [C]//Proceedings of the 14th international conference on World Wide Web. 2005, 391-400.
[4] In-Ho Kang, GilChang Kim. Query type classification for web document retrieval [C]//Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. 2003, 64-71.
[5] Bernard J. Jansen, Danielle L. Booth, Amanda Spink. Determining the user intent of web search engine queries [C]//Proceedings of the 16th international conference on World Wide Web. 2007, 1149-1150.
[6] 张森.WEB检索查询的意图分类研究[D].中科院研究生院硕士学位论文,2008.
[7] KDDCUP2005, http://www.sigkdd.org/kdd2005/kddcup.html [DB/OL].
[8] Dou Shen, Rong Pan, Jian-Tao Sun, etc. Query enrichment for web-query classification [J]. ACM Transactions on Information Systems (TOIS) Volume 24, Issue 3. 2006, 320-352.
[9] Steven M. Beitzel, Eric C. Jensen, Ophir Frieder, etc. Improving Automatic Query Classification via Semi- Supervised Learning [C]//Proceedings of the Fifth IEEE International Conference on Data Mining. 2005, 42-49.
[10] Ricardo Baeza-Yates, Liliana Calderón-Benavides, Cristina González-Caro. The Intention Behind Web Queries [J]. Lecture Notes in Computer Science, 2006, Volume 4209/2006: 98-109.

基金

国家自然科学基金资助项目(60773027,60736044);国家863计划重点资助项目(2006AA010108,2008AA01Z145)
PDF(463 KB)

Accesses

Citation

Detail

段落导航
相关文章

/