Abstract:A basic approach for measuring semantic similarity/distance between words and concepts is to use lexical taxonomy, such as Wordnet. Hownet is a Chinese semantic dictionary, containing abundant semantic information and ontology knowledge, but has quite different construction and architecture. In this paper, we present a new approach using Hownet by drawing in the idea of information theory. We propose that the more semantic information a “sememe” take, the more powerful it in describing concepts. Then we divide “sememe” which describes a concept into two set: directly describing part and indirectly describing part. In the experiments, we demonstrate our method have improved performance in measuring semantic similarity between Chinese words.
[1] Eneko Agirre, German Rigau. A Proposal for Word Sense Disambiguation using Conceptual Distance [A]. In: Proceedings of the First International Conference on Recent Advanced in NLP [C]. 1995. [2] Dekang Lin. An Information-Theoretic Definition of Similarity Semantic distance in WordNet [A]. In: Proceedings of the Fifteenth International Conference on Machine Learning [C]. 1998. [3] HowNet [R]. HowNet’s Home Page. http://www.keenage.com. [4] 刘群, 李素建. 基于《知网》的词汇语义相似度的计算[A] . 第三届汉语词汇语义学研讨会[C],台北,2002. [5] BUDANITSKY, A. AND HIRST, G. Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures [A].In: Workshop on WordNet and Other Lexical Resources, Second meeting of the North American Chapter of the Association for Computational Linguistics[C]. 2001. [6] 吴健, 吴朝晖, 李莹, 等. 基于本体论和词汇语义相似度的Web 服务发现[J]. CHINESE JOURNAL OF COMPUTERS, 2005, 28 (4). [7] 同义词词林[R]. http://www.ir-lab.org/.