WordNet是在自然语言处理领域有重要作用的英语词汇知识库,该文提出了一种将WordNet中词汇概念自动翻译为中文的方法。首先,利用电子词典和术语翻译工具将英语词汇在义项的粒度上翻译为中文;其次,将特定概念中词汇的正确义项选择看作分类问题,归纳出基于翻译唯一性、概念内和概念间翻译交集、中文短语结构规则,以及基于PMI的翻译相关性共12个特征,训练分类模型实现正确义项的选择。实验结果表明,该方法对WordNet 3.0中概念翻译的覆盖率为85.21%,准确率为81.37%。
Abstract
WordNet is an important English lexical semantic knowledge base. This paper presents a method for the automatic translation of the synsets in the WordNet into Chinese, named as WNCT. Firstly, WNCT uses dictionaries and term translation tools to translate the senses of English words in the WordNet into Chinese. Then WNCT regards the selection for correct sense of the words in a synset as a classification issue. The classification model is then trained by 12 features extracted according to the uniqueness of translation, the translation intersections within and between the concepts, the construction rules for Chinese phrase as well as PMI based translation relevance. Experimental results show that WNCT achieve 85.21% coverage rate and 81.37% accuracy for the Chinese translation of the synsets in WordNet 3.0.
Key words artificial intelligence; machine translation; WordNet translation; word translation; translation disambiguation; Chinese lexical knowledge base; Chinese information processing
关键词
人工智能 /
机器翻译 /
WordNet翻译 /
词汇翻译 /
翻译消歧 /
中文词汇知识库 /
中文信息处理
{{custom_keyword}} /
Key words
artificial intelligence /
machine translation /
WordNet translation /
word translation /
translation disambiguation /
Chinese lexical knowledge base /
Chinese information processing
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] George A. Miller. WordNet: A Lexical Database for English [J]. Communications of the ACM (CACM), 1995, 38: 39-41.
[2] Piek Vossen. Eurowordnet: a multilingual database with lexical semantic networks [M]. Dordrecht: Kluwer Academic Publishers, 1998.
[3] Altangere Chagnaa1, Ho-Seop Choe, Cheol-Young Ock and Hwa-Mook Yoon. On the Evaluation of Korean WordNet[C]//TSD 2007: 123-130.
[4] 梅家驹, 竺一鸣, 高蕴琦. 同义词词林[M]. 上海:上海辞书出版社, 1983.
[5] 董振东,董强,郝长伶. 知网的理论发现[J]. 中文信息学报,2007,(21): 3-9.
[6] 于江生, 俞士汶. 中文概念词典的结构[J]. 中文信息学报,2002,(4): 12-20.
[7] Chen H .H, Lin, C. C., and Lin, W. C. Construction of a Chinese-English WordNet and its application to CLIR [C]//Proceedings of the Fifth International Workshop on Information Retrieval with Asian Languages, 2000.
[8] Hsin-Hsi Chen, Chi-Ching Lin, Wen-Cheng Lin. Building a Chinese-English WordNet for Translingual applications [J]. ACM Trans. Asian Lang. Inf. Process, 2002.
[9] 刘明强. WordNet节点翻译系统的研究与实现[D]. 中国, 沈阳: 东北大学硕士毕业论文, 2003.
[10] 张莉,李晶皎,胡明涵,等.中文WordNet的研究及实现[J]. 东北大学学报(自然科学版),2003, 24: 327-330.
[11] Christopher D. Manning, and Hinrich Schütze. Foundations of Statistical Natural Language Processing [M]. MIT Press, 1999.
[12] Gaolin Fang, Hao Yu, Fumihito Nishino. Chinese-English Term Translation Mining Based on Semantic Prediction[C]//ACL 2006.
[13] 何彦璋. 从Web中获取中文术语英文翻译的方法研究与实现[D]. 北京: 北京航空航天大学, 2008.
[14] 董振东. 知网—中文信息结构库[DB/OL]. http://www.keenage.com/zhiwang/aboutMessage.html, 1996-2001.
[15] 詹卫东. 面向中文信息处理的现代汉语短语结构规划研究[D]. 北京: 北京大学博士论文, 2000.
[16] Thomas M. Cover, Joy A. Thomas. Elements of Information Theory[M]. John Wiley & Sons, Inc. July 2006.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金资助项目(60496326, 60573063, 60573064); 国家863计划资助项目(2007AA01Z325)
{{custom_fund}}