目前的跨语言信息检索能够使用的方法有四种查询词翻译的方法、文档翻译的方法、中间语言翻译方法和非翻译的方法。该文对这四种方法进行了简要介绍,提出它们的优缺点,并且提出了一种新的非翻译的方法——基于中间语义的方法。我们对提出来的方法进行了TREC跨语言语料库的试验,并且与单语言的信息检索模型进行了比较。试验证明我们的方法具有很好的性能和健壮性。
Abstract
There are four main approaches to present cross-language information retrieval (CLIR)query translation approach, document translation approach, interlingua representation approach and translation-free approach. After discussing the advantages and disadvantages of these four approaches, this paper proposes a novel translation-free approach based on interlingua semantics. We test our approach on TREC cross-language corpus, and compare it with the mono-lingual information retrieval model. The results prove that our approach bears good performance and robustness.
关键词
计算机应用 /
中文信息处理 /
跨语言信息检索 /
中间语义 /
潜在语义对 /
偏最小二乘 /
TREC
{{custom_keyword}} /
Key words
computer application /
Chinese information processing /
CLIR /
interlingua semantics /
potential semantic pair /
PLS /
TREC
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] P. F. Brown, J.Cocke, S.Della Pietra. A Statistical Approach to Machine Translation[J]. Computational Linguistics,1990,16(2).
[2] J.Xu, R.Weischedel, and C.Nguyen. Evaluating a Probabilistic Model for Cross-lingual Information Retrieval[C]//Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,2001: 105-110.
[3] T. Hedlund, H. Keskustalo, E. Airio, UTACLIR-An Extendable Query Translation System[C]//Workshop on Cross-Language Information Retrieval, A Research Roadmap, Organized at 22nd International Conference On Research and Development in Information Retrieval, SIGIR, Tampere, Finland, 2002.
[4] L. Ballesteros, W.B. Croft, Resolving Ambiguity for Cross-Language Retrieval[C]//Proceedings of ACM SIGIR,1998: 64-71.
[5] Gao, J . Nie.J.Y, Xun, E. Zhang, Improving query translation for cross-language information retrieval using statistical models[C]//SIGIR 2001: 96-104.
[6] 金千里,赵军,徐波.弱指导的统计隐含语义分析及其在跨语言信息检索中的应用[C]//全国第七届计算语言学联合学术会议 北京: 清华大学,2003-08-01: 527-533.
[7] Salton G, Wong A, Yang C S. A Vector Space Model for Automatic Indexing[J]. Communications of the ACM November, 1975,18(11).
[8] Ricardo Baeza-Yates, Berthier Ribeiro-Neto. Modern Information Retrieval[M]. Beijing: Machine Press, 2004: 19-140.
[9] Robertson S E, Jones S K. Relevance Weighting of Search Terms[J].JASIS,1976,27: 129-146.
[10] 王惠文.偏最小二乘回归方法及其应用[M].北京: 国防工业出版社,1999.
[11] Wold, H. Partial least squares [M]. New York: Kotz, S. and Johnson N.L., Encyclopedia of Statistical Science. Wiley, 1985.
[12] Mingwen Wang, Hao Ye, Guobin Huang, Wenxia Bi. A Cross Language Retrieval Model Based On Interlingua Semantics[J]. Journal of Computational Information Systems.2007, 3(4): 1555-1560.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60663007);江西省科技攻关项目(20062184);江西省教育厅科技项目(20072129)
{{custom_fund}}