郑剑夕,白 宇,郭 程,张桂平. Wikipedia跨语言链接发现中的锚文本译项选择[J]. 中文信息学报, 2016, 30(2): 196-201.
ZHENG Jianxi, BAI Yu, GUO Cheng, ZHANG Guiping. The Translation Selection of Anchor Text in Wikipedia Cross-Lingual Link Discovery. , 2016, 30(2): 196-201.
Wikipedia跨语言链接发现中的锚文本译项选择
郑剑夕,白 宇,郭 程,张桂平
沈阳航空航天大学 知识工程研究中心,辽宁,沈阳 110136
The Translation Selection of Anchor Text in Wikipedia Cross-Lingual Link Discovery
ZHENG Jianxi, BAI Yu, GUO Cheng, ZHANG Guiping
(Research Center for Knowledge Engineering, Shenyang Aerospace University, Shenyang, Liaoning 110136, China)
Abstract:The research on Wikipedia Cross-Lingual Link Discovery (CLLD) is to automatically identify an anchor text related to topic from source language Wikipedia articles, and recommend a set of relevant target language links to the anchor text. It involves three key problems: anchor text identification, anchor text translation, and target link discovery. To deal with the multiple target translations of an anchor text, we propose a context-based translation selection method, which uses a vote method based on pointwise mutual information (PMI). Experiments on the translation selection of person names, terminology and abbreviation in Chinese and English Wikipedia articles, the results show that the method achieves good performances.
[1] 涂新辉,张红春,周琨峰,等. 中文维基百科的结构化信息抽取及词语相关度计算方法[J].中文信息学报,2012,26(3): 109-115. [2] Huang W C, Trotman A, Geva S. A Virtual Evaluation Track for Cross Language Link Discovery [A]. In SIGIR09. Boston, USA, 2009: 1-7. [3] Tang L X, Trotman A, Geva S, et al. Cross-Lingual Knowledge Discovery: Chinese-to-English Article Linking in Wikipedia [J]. Information Retrieval Technology. Springer Berlin Heidelberg, 2012: 286-295. [4] Kang I S, Marigomen R. English-to-Korean Cross-linking of Wikipedia Articles at KSLP [C]//Proceedings of NTCIR-9, Tokyo, Japan, 2011: 481-483. [5] Tang L X, Cavanagh D, Trotman A. Automated Cross-lingual Link Discovery in Wikipedia [C]//Proceedings of NTCIR-9, Tokyo, Japan, 2011: 512-529. [6] Liu M F, Kang L, Yang S, et al. WUST EN-CS Crosslink System at NTCIR-9 CLLD Task [C]//Proceedings of NTCIR-9, Tokyo, Japan, 2011: 508-511. [7] Gao Y F, Xu H J, Zhang J S, et al. Multi-filtering Method Based Cross-lingual Link Discovery [C]//Proceedings of NTCIR-9, Tokyo, Japan, 2011: 520-523. [8] Kim J, Gurevych I. UKP at CrossLink: Anchor Text Translation for Cross-lingual Link Discovery [C]//Proceedings of NTCIR-9, Tokyo, Japan, 2011: 487-494. [9] 赵军. 命名实体识别、排歧和跨语言关联[J]. 中文信息学报,2009,23(2): 3-17. [10] 郭稷,吕雅娟,刘群. 一种有效的基于Web的双语翻译对获取方法[J]. 中文信息学报,2008,22(6): 103-109. [11] Tang L X. Link Discovery for Chinese/English Cross-Language Web Information Retrieval [D]. Queensland University of Technology, 2012. [12] 朱亚东,张成,俞晓明,等. 基于逐点互信息的查询结构分析[J]. 中文信息学报,2012,26 (5): 33-39. [13] Tang L X, Kang I S, Kimura F, et al. Overview of the NTCIR-10 Cross-Lingual Link Discovery Task [C]//Proceedings of NTCIR-10, Tokyo, Japan, 2013: 1-36. [14] Jin P, Wu Y F, Yu S. SemEval-2007 Task 5: Multilingual Chinese-English Lexical Sample [C]//Proceedings of SemEval-2007 Prague, 2007: 19-23. [15] 刘鹏远,赵铁军. 基于双语词汇Web间接关联的无指导译文消歧[J]. 软件学报, 2010, 21 (4): 575-585.