冗长查询指用户提交的句子成份复杂的查询。当前的搜索引擎对于关键字的检索取得了较好的结果。但是对于冗长的查询,如果将所有词作为关键字进行检索,往往只能返回相当有限的结果。我们尝试利用关键词之间的词语关联度,发现语义蕴含,删除“信息量”小的关键词,提高检索的效果。对于实验结果,我们分别从“面向机器”和“面向用户”两个角度进行评价。在“面向机器”的评价部分,我们根据搜索引擎返回结果的标红率和结果数进行自动评价;在“面向用户”的评价部分,我们对搜索结果文档进行人工评价。实验结果表明,我们的方法能够明显提高检索结果的数量和质量。
Abstract
Long queries refer to complex queries submitted by users. Current search engines good at keywords matching will return limited results if all words in the long queries are matched as keywords, often only very limited results are returned. In this paper, we attempt to improve the retrieval results by using the association between the words to delete the words which offer little information. In our experiments, two aspects of evaluation,“machine-oriented” and “user-oriented” are used. In the “machine-oriented” evaluation, the highlight ratio and the result number of related documents is considered. In the “user-oriented” evaluation, the retrieval results are evaluated by a human judger. The experimental results show that our method can significantly improve the quantity and quality of search results.
关键词
查询缩略 /
词语关联度 /
评价方式
{{custom_keyword}} /
Key words
query reduction /
word association /
evaluation methods
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] M Bendersky, W B Croft. Discovering key concepts in verbose queries[C]//Proceedings of SIGIR 08, 2008: 491-498.
[2] G Kumaran, J Allan. A case for shorter queries, and helping user create them[C]//Proceedings of HLT.2007: 220-227.
[3] J Allan, J Callan, W B Croft, et al. INQUERY at TREC-5[C]//Proceedings of the 5th Text Retrieval Conference TREC-5. 1997: 119-132.
[4] M Lease, J Allan, W B Croft. Regression rank: learning to meet the opportunity of descriptive queries[C]//Proceedings of ECIR 2009. 2009: 90-101.
[5] R Jones, D C Fain. Query word deletion prediction[C]//Proceedings of 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2003: 435-436.
[6] N Balasubramanian, G Kumaran, V R Carvalho. Exploring reductions for long web queries[C]//Proceedings of the 33rd international ACM SIGIR Conference on Research and Development in information Retrieval. 2010: 571-578.
[7] G Kumaran, J Allan. Adapting information retrieval systems to user queries. Information Processing and Management[J]. 2008: 1838-1862.
[8] S Huston and W B. Croft. Evaluating verbose query processing techniques[C]//Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in information Retrieval, SIGIR 10, New York, NY, USA, 2010: 291-298.
[9] J Guo, G Xu, H Li, et al. A unified and discriminative model for query refinement[C]//Proceedings of SIGIR 08, New York, NY, USA, 2008:379-386.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金面上项目(61073126,61273321);国家自然科学基金(61133012);国家863前沿技术研究项目(2012AA011102)
{{custom_fund}}