阻碍互联网资源在世界范围内广泛共享的一个主要障碍是多语言问题,而跨语言信息检索是解决这个问题的有效方法之一。本文从定义跨语言信息检索系统开始,给出了一个标准的跨语言信息检索系统框架和评价方法,对主流研究方法进行了重新审视,进一步明确指出了跨语言信息检索中必须解决的核心问题,最后通过分析研究现状给出了未来可能的重点研究方向。
Abstract
One of the most frustrating obstacles in sharing online information among people in different countries is the multilingual problem. The research of Cross-Language Information Retrieval (CLIR) plays an important role on this problem. Firstly a formal definition and the standard framework of CLIR are given in this paper. Secondly, we presents the evaluation method for a CLIR system. Then three mainstream approaches in research of CLIR are reassessed, and the key problems, that is, out of vocabulary (OOV) and word sense disambiguation (WSD) , in CLIR are extracted from the fuzzy appearance. Finally , according to observations on the state of the art on CLIR, we give several promising directions for CLIR research in the near future.
关键词
计算机应用 /
中文信息处理 /
跨语言信息检索 /
未登录词 /
词义消歧 /
多语言信息检索
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
cross-language information retrieval /
out of vocabulary /
word sense disambiguation /
multilingual information retrieval
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Douglas W. Oard, Bonnie J. Dorr. A Survey of Multilingual Text Retrieval[A]. Univ. of Maryland, Tech Rep: UMIACS-TR-96-19 CS-TR-3615, 1996.
[2] 张俊林,曲为民,杜林,孙玉芳. 跨语言信息检索研究进展[J]. 计算机科学, 2004, 31 (7) : 16 - 19.
[3] F. Gey, A. Chen. TREC-9 Cross-Language Information Retrieval (English-Chinese) Overview [A]. TREC9 [C]. Gaithersburg, Maryland. 2000.
[4] C. Peters. Introduction[A]. Lecture Notes in Computer Science: Cross-Language Information Retrieval and Evaluation: Workshop of Cross-Language Evaluation Forum. CLEF 2000 [C]. Germany: Springer, 2001. 1 - 6.
[5] Noriko Kando. Preface[A]. In: proceedings of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition[C]. Japan, 1999.
[6] Lisa Ballesteros, W. Burce Croft. Resolving ambiguity for cross-language retrieval [A]. In: proceedings of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition[C]. Melbourne, Australia. 1998.
[7] Lisa Ballesteros, W. Burce Croft. Phrasal Translation and Query Expansion Techniques for Cross-Language Information[A]. ACM SIGIR1997 [C]. Philadelphia, Pennsylvania, United States. July 27- 31, 1997.
[8] Douglas W. Oard. A Comparative Study of Query and Document Translation for Cross-Language Information Retrieval[A]. In: proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup [C]. 1998.
[9] Ari Pirkola. The Effects of Query Structure and Dictionary-Setup s in Dictionary-Based Cross-language Information Retrieval[A]. ACM SIGIR1998 [C]. Melbourne, Australia. 1998.
[10] Jian-Yun Nie Michel Simard Pierre Lsabelle Richard Durand. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web[A]. ACM SIGIR1999 [C]. Berkeley, California, United States. 1999.
[11] Zhang Yibo, Sun Le,Du Lin, Jin Youbing and Sun Yufang. ISCAS: Text Retrieval in NTCIR Workshop II[A]. NTCIR Workshop 2 Meeting[C]. Tokyo, Japan. 2001.
[12] Junlin Zhang, Le Sun, Weimin Qu et al. ISCAS at NTCIR-3: Monolingual, Bilingual and MultiLingual IR Tasks[A]. NTCIR Workshop 3 Meeting[C]. Tokyo, Japan. 2003.
[13] Zhang Junlin, Sun Le, Zhang Yongchen et al. Applying Language Model into IR Task[A]. NTCIR Workshop 4 Meeting[C]. Tokyo, Japan. 2004.
[14] L Ballesteros, WB Croft. Dictionary-basedmethods for cross-lingual information retrieval[A]. In: proceedings of the 7th International DEXA Conference on Database and Expert Systems Applications[C]. 1996.
[15] Y Zhang, P Vines. Detection and translation of OOV terms prior to query time [A]. ACM SIGIR 2004 [C]. Sheffield, UK. 2004.
[16] WH Lu, LF Chien, HJ Lee. Translation of Web Queries Using Anchor Text Mining[A]. ACM Transactions on Asian Language Information Processing[C]. 2002.
[17] Wessel Kraaij, Jian-Yun Nie, Michel Simard. Embedding web-based statistical translation models in cross-language information retrieval[J]. Computational Linguistics. 2003, 29 (3) : 381 - 419.
[18] Monica Rogati, Yiming Yang. Resource Selection for Domain-Specific Cross-Lingual IR [A]. ACM SIGIR 2004 [C]. Sheffield, UK. 2004.
[19] KornélMarkó, Stefan Schulz, Olena Medelyan. Bootstrapping dictionaries for cross language information retrieval[A]. In: proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval[C]. Salvador, Brazil. 2005.
[20] 王进,陈恩红,张振亚. 基于本体的跨语言信息检索模型[J]. 中文信息学报, 2004, 18 (3) : 1 - 8.
[21] 陈江,聂建云. 利用平行网页建立中英文统计 - 翻译模型[J]. 中文信息学报, 2001, 15 (1) : 26 - 32.
[22] Christof Monz, Bonnie J. Dorr. Iterative translation disambiguation for cross-language information retrieval [A]. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval [C]. Salvador, Brazil. 2005.
[23] Yi Liu, Rong Jin, Joyce Y. A maximum coherence model for dictionary based cross language information retrieval [A]. In: proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval [C]. Salvador, Brazil. 2005.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60203007);国家863高技术研究发展计划资助项目(2003AA1Z2110);北京市科技新星计划资助项目(H020820790130)
{{custom_fund}}