中文词语语义相似度计算——基于《知网》2000

李峰,李芳

PDF(425 KB)
PDF(425 KB)
中文信息学报 ›› 2007, Vol. 21 ›› Issue (3) : 99-105.
综述

中文词语语义相似度计算——基于《知网》2000

  • 李峰,李芳
作者信息 +

An New Approach Measuring Semantic Similarity in Hownet 2000

  • LI Feng ,LI Fang
Author information +
History +

摘要

词语语义相似度的计算,一种比较常用的方法是使用分类体系的语义词典(如Wordnet)。本文首先利用Hownet中“义原”的树状层次结构,得到“义原”的相似度,再通过“义原”的相似度得到词语(“概念”)的相似度。本文通过引入事物信息量的思想,提出了自己的观点: 认为知网中的“义原”对“概念”描述的作用大小取决于其本身所含的语义信息量;“义原”对“概念”的描述划分为直接描述和间接描述两类,并据此计算中文词语语义相似度,在一定程度上得到了和人的直观更加符合的结果。

Abstract

A basic approach for measuring semantic similarity/distance between words and concepts is to use lexical taxonomy, such as Wordnet. Hownet is a Chinese semantic dictionary, containing abundant semantic information and ontology knowledge, but has quite different construction and architecture. In this paper, we present a new approach using Hownet by drawing in the idea of information theory. We propose that the more semantic information a “sememe” take, the more powerful it in describing concepts. Then we divide “sememe” which describes a concept into two set: directly describing part and indirectly describing part. In the experiments, we demonstrate our method have improved performance in measuring semantic similarity between Chinese words.

关键词

计算机应用 / 中文信息处理 / 词语语义相似度 / 知网 / “义原” / 语义信息量

Key words

computer application / Chinese information processing / semantic similarity / Hownet / “sememe” / semantic information

引用本文

导出引用
李峰,李芳. 中文词语语义相似度计算——基于《知网》2000. 中文信息学报. 2007, 21(3): 99-105
LI Feng ,LI Fang. An New Approach Measuring Semantic Similarity in Hownet 2000. Journal of Chinese Information Processing. 2007, 21(3): 99-105

参考文献

[1] Eneko Agirre, German Rigau. A Proposal for Word Sense Disambiguation using Conceptual Distance [A]. In: Proceedings of the First International Conference on Recent Advanced in NLP [C]. 1995.
[2] Dekang Lin. An Information-Theoretic Definition of Similarity Semantic distance in WordNet [A]. In: Proceedings of the Fifteenth International Conference on Machine Learning [C]. 1998.
[3] HowNet [R]. HowNet’s Home Page. http://www.keenage.com.
[4] 刘群, 李素建. 基于《知网》的词汇语义相似度的计算[A] . 第三届汉语词汇语义学研讨会[C],台北,2002.
[5] BUDANITSKY, A. AND HIRST, G. Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures [A].In: Workshop on WordNet and Other Lexical Resources, Second meeting of the North American Chapter of the Association for Computational Linguistics[C]. 2001.
[6] 吴健, 吴朝晖, 李莹, 等. 基于本体论和词汇语义相似度的Web 服务发现[J]. CHINESE JOURNAL OF COMPUTERS, 2005, 28 (4).
[7] 同义词词林[R]. http://www.ir-lab.org/.
PDF(425 KB)

1605

Accesses

0

Citation

Detail

段落导航
相关文章

/