在信息检索①发展的过程中,研究者们不断尝试着将自然语言处理应用到检索里,希望能够为检索效果提高带来帮助。然而这些尝试的结果大多和研究者们最初的设想相反,自然语言处理在大多数情况下没有改进信息检索效果,甚至反而起了负面作用。即便有一些帮助,也往往是微小的,远远不如自然语言处理所需要的计算消耗那么大。研究者们对这些现象进行了分析,认为: 自然语言处理更适合于应用在需要精确结果的任务中,例如问答系统、信息抽取等;自然语言处理需要针对信息检索进行优化才可能发挥积极作用。最新的一些进展(例如在语言模型中加入自然语言处理)在一定程度上印证了这一结论。
Abstract
Natural language processing (NLP) has been used in information retrieval (IR) by researchers, in the hope of improving retrieval effect. But most of the results are in the opposite way hypothesized. In most cases, NLP didn’t yield increases in IR precision but took a negative effect. Even if NLP helped IR under some circumstances, the improvements were much smaller than the processing cost needed by NLP. Researchers perform analysis on these phenomena and come to the conclusion that: IR-related tasks that acquire accurate results, such as question answering (QA) and information extraction (IE), are more suited for the use of NLP. NLP needs to be optimized for IR in order to be effective. Recent research, e.g. adding NLP factors to language model, has more or less confirmed the conclusion.
关键词
人工智能 /
自然语言处理 /
综述 /
信息检索
{{custom_keyword}} /
Key words
artificial intelligence /
natural language processing /
overview /
information retrieval
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Ricardo Baeza-Yates. Challenges in the Interaction of Information Retrieval and Natural Language Processing [A]. In: Proceedings of 5th International Conference on Intelligent Text Processing and Computational Linguistics [C], CICLing 2004, Seoul, Korea, February 15-21, 2004. 445-456.
[2] Alan F. Smeaton. Using NLP or NLP Resources for Information Retrieval Tasks [A]. In: Natural Language Information Retrieval [M]. T. Strzalkowski, editor, Kluwer, 1997. 99-111.
[3] http://wordnet.princeton.edu/.
[4] http://www.keenage.com/.
[5] Alan F. Smeaton. Natural Language Processing & Information Retrieval, a lecture presented at the European Summer School in Information Retrieval [Z]. Glasgow, 1995.
[6] Fuchun Peng, Xiangji Huang, Dale Schuurmans and Nick Cercone. Investigating the Relationship between Word Segmentation Performance and Retrieval Performance in Chinese IR [A]. In: Proceedings of 19th International Conference on Computational Linguistics [C], 2002. 72-78.
[7] 金澎,刘毅,王树梅.汉语分词对中文搜索引擎检索性能的影响 [J].情报学报,2006,25(1):21-24.
[8] Schubert Foo, Hui Li. Chinese word segmentation and its effect on information retrieval [J]. Information Processing and Management, 2004, 40(1):161-191.
[9] Tomek Strzalkowski and Barbara Vauthey. Information retrieval using robust natural language processing [A]. In: Proceedings of the 30th annual meeting on Association for Computational Linguistics [C], 1992. 104-111.
[10] J Xu and W. B. Croft, Corpus-based stemming using cooccurrence of word variants [J]. ACM Transactions on Information Systems (TOIS), 1998, 16(1):61-81.
[11] 苏祺,昝红英,胡景贺,项锟.词性标注对信息检索系统性能的影响 [J].中文信息学报,2005,19(2):58-65.
[12] W. Kraaij and R. Pohlmann. Viewing stemming as recall enhancement [A]. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. ACM Press, 1996. 40-48.
[13] A.T. Arampatzis, Th.P. van der Weide, C.H.A. Koster and P. van Bommel, Text Filtering using Linguistically-motivated Indexing Terms [R]. Technical Report CSI-R9901, Computing Science Institute, University of Nijmegen, Nijmegen, The Netherlands, 1999.
[14] Thorsten Brants. Natural Language Processing in Information Retrieval [A]. In: Proceedings of 20th International Conference on Computational Linguistics [C]. Antwerp, Belgium, 2004. 1-13.
[15] Ellen M. Voorhees. Using WordNet to disambiguate word senses for text retrieval [A]. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. ACM Press, 1993. 171-180.
[16] M. Mitra, C. Buckley, A. Singhal, and C. Cardie. An analysis of statistical and syntactic phrases [A]. In: Proceedings of the RIAO97 [C]. 1997. 200-216.
[17] Tomek Strzalkowski. Natural language information retrieval [J]. Information Processing & Management, 1995. 31 (3):397-417.
[18] S.E. Robertson and S. Walker. Okapi/Keenbow at TREC-8 [A]. In: Proceedings of the 8th Text Retrieval Conference [C]. NIST Special Publications 500-246, Gaithersburg, 1999. 151-162.
[19] Jian-Yun Nie and Jean-Francois Dufort. Combining words and compound terms for monolingual and cross-language information retrieval [A]. In: Proceedings of Information [C]. Beijing: 2002. 453-458.
[20] Shuang Liu, Fang Liu, Clement Yu and Weiyi Meng. An Effective Approach to Document Retrieval via Utilizing WordNet and Recognizing Phrases [A]. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. ACM Press, 2004. 266-272.
[21] James Allan. Natural Language Processing for Information Retrieval, tutorial presented at the NAACL/ANLP language technology joint conference in Seattle [Z]. Washington, April 29, 2000.
[22] M. Sanderson. Word Sense Disambiguation and Information Retrieval [A]. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. ACM Press, 1994. 49-57.
[23] Christopher Stokoe, Michael P. Oakes and John Tait. Word Sense Disambiguation in Information Retrieval Revisited [A]. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press [C]. 2003. 159-166.
[24] Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim. Information Retrieval using Word Senses: Root Sense Tagging Approach [A]. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. ACM Press, 2004. 258-265.
[25] James Allan and Hema Raghavan. Using Part-of-speech Patterns to Reduce Query Ambiguity [A]. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. ACM Press, 2002. 307-314.
[26] Shuang Liu, Clement Yu and Weiyi Meng. Word Sense Disambiguation in Queries [A]. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management [C]. ACM Press, 2005. 525-532.
[27] James Allan and Giridhar Kumaran. Stemming in the Language Modeling Framework [A]. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (poster) [C]. ACM Press, 2003. 455-456.
[28] Guihong Cao, Jian-Yun Nie and Jing Bai. Integrating Word Relationships into Language Models [A]. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. ACM Press, 2005. 298-305.
[29] Jianfeng Gao, Haoliang Qi, Xinsong Xia and Jian-Yun Nie. Linear Discriminant Model for Information Retrieval [A]. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. ACM Press, 2005. 290-297.
[30] Tomek Strzalkowski and Karen Sparck Jones. NLP Track at TREC-5 [A]. In: Proceedings of the 5th Text Retrieval Conference [C]. NIST Special Publications 500-238, Gaithersburg, 1996. 97-100.
[31] Christopher Manning. Opportunities in Natural Language Processing [Z]. presentation given at Oracle, 2002.
[32] Zhiguo Gong, Chan Wa Cheang and Leong Hou U. Web Query Expansion by WordNet [A]. In: Proceedings of 16th International Conference of Database and Expert Systems Applications [C]. Copenhagen, Denmark, August 22-26, LNCS 3588, 2005. 166-175.
[33] Min Zhang, Ruihua Song, Chuan Lin, Shaoping Ma, et al. Expansion-Based Technologies in Finding Relevant and New Information: THU TREC2002 Novelty Track Experiments [A]. In: Proceedings of the 11th Text Retrieval Conference [C]. NIST Special Publication, Gait hersburg, MD, USA: 2002. 591-595.
[34] 赵军,金千里,徐波.面向文本检索的语义计算 [J].计算机学报,2005,28(12): 2068-2078.
[35] David D. Lewis and Karen Sparck-Jones. Natural Language Processing for Information Retrieval [J]. Communications of the ACM, 1996, 39(1): 92-101.
[36] Ellen M.Voorhees.Natural Language Processing and Information Retrieval [A]. Information Extraction: Towards Scalable, Adaptable Systems [M]. LNCS 1714, 1999. 32-48.
[37] Tomek Strzalkowski, Fang Lin, Jin Wang and Jose Perez-Carballo. Evaluating Natural Language Processing Techniques in Information Retrieval: A TREC perspective [A]. In: Strzalkowski, Tomek(Ed). Natural Language Information Retrieval [M]. Kluwer, 1999.
[38] Alan F. Smeaton. Information Retrieval: Still Butting Heads with Natural Language Processing [A]. In: Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology [M]. Frascati, Italy, July 1997. 115-138.
[39] Karen Sparck Jones. What is the role of NLP in text retrieval [A]. In: Natural Language Information Retrieval [M]. T. Strzalkowski, editor, Kluwer, 1999.
[40] Lina Zhou and Dongsong Zhang. NLPIR: A Theoretical Framework for Applying Natural Language Processing to Information Retrieval [J]. Journal of the American Society for Information Science and Technology, 2003, 54(2):115-123.
[41] Hui Yang and Tat-Seng Chua. QUALIFIER: Question Answering by Lexical Fabric and External Resources [A]. In: the Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL) [C]. 2003. 363-370.
[42] Margaret Connell, Ao Feng, Giridhar Kumaran, Hema Raghavan, Chirag Shah and James Allan. UMass at TDT 2004 [A]. TDT2004 Workshop [C]. 2004.
[43] http://trec.nist.gov/.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家973重点基础研究资助项目(2004CB318108);国家自然科学基金资助项目(60621062,60503064);国家863计划资助项目(2006AA01Z141)
{{custom_fund}}