基于语义树的中文词语相似度计算与分析

张 亮1.2,尹存燕1,陈家骏1

PDF(1327 KB)
PDF(1327 KB)
中文信息学报 ›› 2010, Vol. 24 ›› Issue (6) : 23-31.
综述

基于语义树的中文词语相似度计算与分析

  • 张 亮1.2,尹存燕1,陈家骏1
作者信息 +

Chinese Word Similarity Computing Based on Semantic Tree

  • ZHANG Liang1,2,YIN Cunyan1, CHEN Jiajun1
Author information +
History +

摘要

词语相似度的分析与计算是自然语言处理关键技术之一,对句法分析、机器翻译、信息检索等能提供很好的帮助。基于语义资源Hownet的中文词语相似度计算是近年来的研究热点,但大多数的研究都是对中国科学院计算技术研究所刘群提出的计算方法的改进和完善。该文充分分析和利用新版Hownet(2007)的概念架构和语义多维表达形式,从概念的主类义原、主类义原框架以及概念特性描述三个方面综合分析词语相似度,并在计算中区分语义特征相似度和句法特征相似度。实验结果理想,与人的直观判断基本一致。

Abstract

Word similarity analysis and computing is one of the key technologies in natural language processing. It can offer substantial help to parsing, machine translation and information retrieval etc. Recently Chinese word similarity computing based on Hownet has become a hot research issue, though most of which are improvements or modifications to what was proposed in (Liu, 2002). Based on new Hownet(2007) with its concept frame and the multi-dimension semantic expression form, this paper proposes a new method to analyze and compute Chinese word similarity from three dimensionsthe main sememe, the main sememe frame and the concept characteristic description. This method also distinguishes the semantic similarity and the syntax similarity in computation. Experiment shows that the method produces a good performance.
Key wordssemantic tree;words similarity;Hownet2007;distance of semantic

关键词

语义树 / 词语相似度 / 《知网》2007 / 语义距离

Key words

semantic tree / words similarity / Hownet2007 / distance of semantic
 
/   /   /
 
/   /   /
 
/   /  

引用本文

导出引用
张 亮1.2,尹存燕1,陈家骏1. 基于语义树的中文词语相似度计算与分析. 中文信息学报. 2010, 24(6): 23-31
ZHANG Liang1,2,YIN Cunyan1, CHEN Jiajun1. Chinese Word Similarity Computing Based on Semantic Tree. Journal of Chinese Information Processing. 2010, 24(6): 23-31

参考文献

[1] 刘群,李素建. 基于《知网》的词汇语义相似度的计算[C]//第三届汉语词汇语义学研讨会. 中国台北,2002.
[2] Green, Rebecca and Bonnie J. Dorr. Inducing A Semantic Frame Lexicon from WordNet Data[C]//Proceedings of the 2nd Workshop on Text Meaning and Interpretation (ACL 2004).
[3] 李涓子.汉语词义排歧方法研究[D].清华大学博士论文,1999.
[4] 鲁松.自然语言中词相关性知识无导获取和均衡分类器的构建[D].中国科学院计算技术研究所博士论文.2001.
[5] Dagan I., Lee L. and Pereira F. (1999), Similarity-based models of word cooccurrence probabilities[C]//.Machine Learning, Special issue on Machine Learning and Natural Language, 1999.
[6] 董振东, 董强. 《知网》[DB/OL]. http://www.keenage.com
[7] 董振东,董强,郝长伶.《知网》的理论发现[J]. 中文信息学报,2007,21(4):3-9.
[8] Dekang Lin. An Information Theoretic Definition of Similarity Semantic distance in WordNet [C]//Proceedings of the Fifteenth International Conference on Machine Learning. 1998.
[9] Eneko Agirre, German Rigau. A Proposal for Word Sense Disambiguation using Conceptual Distance[C]//Proceedings of the First International Conference on Recent Advanced in NL P. 1995.
[10] BUDANITSKY, A. AND HIRST, G. Semantic distance in WordNet: An experimental, application oriented evaluation of five measures[C]//Workshop on WordNet and Other Lexical Resources, Second meeting of the North American Chapter of the Association for Computational Linguistics. 2001.
[11] 李峰,李芳. 中文词语语义相似度计算——基于《知网》2000[J]. 中文信息学报,2007,21(3): 99-105.
[12] 吴健,吴朝晖,李莹,等. 基于本体论和词汇语义相似度的Web服务发现[J]. Chinese Journal of Computers,2005, 28 (4).
[13] 朱嫣岚,闵锦,周雅倩,黄萱菁,等. 基于HowNet的词汇语义倾向计算[J]. 中文信息学报, 2006, 20(1): 14-20.

基金

国家863高技术发展研究计划资助项目(2006AA010109);国家自然科学基金资助项目(60673043)
PDF(1327 KB)

787

Accesses

0

Citation

Detail

段落导航
相关文章

/