吴佐衍,王 宇. 基于HNC理论的词语相似度计算[J]. 中文信息学报, 2014, 28(2): 37-43.
WU Zuoyan, WANG Yu. A New Measure of Semantic Similarity Based on Hierarchical Network of Concepts. , 2014, 28(2): 37-43.
基于HNC理论的词语相似度计算
吴佐衍,王 宇
大连理工大学 管理科学与工程学院,辽宁 大连 116024
A New Measure of Semantic Similarity Based on Hierarchical Network of Concepts
WU Zuoyan, WANG Yu
School of Management Science and Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China
摘要该文运用自然语言处理的概念层次网络(Hierarchical Network of Concepts,HNC)理论提出了一种词语相似度计算方法。该方法利用HNC理论词汇层面联想的概念表述体系,根据HNC映射符号的编码规则和符号映射理论,综合概念内涵、概念外部特征、概念类别和组合符号来计算词语的相似度,并与基于知网的词语相似度算法和人工的主观判断的相似度进行了比较分析。实验结果表明,该方法能够较好地反映词语之间的语义差别,与人的直观判断基本一致,是一种有效可行的方法。
Abstract:A new measure based on Hierarchical Network of Concepts(HNC) theory is put forward to compute the semantic similarityin natural language. Based on the coding rules and the map theory included in the concept expression form in the vocabulary relation level of HNC, the method integrates the concept of connotation, outward features, classification and combination of symbol to calculate semantic similarity. This method is compared with the current popular similarity methods based onHowNetaccording to the subjective judgment of human. Experiment showsthat the method has a good performance, which can distinguish the differences between different words more accurately.
[1] 刘群,李素建. 基于《知网》的词汇语义相似度计算[C]//台北: 第三届汉语词汇语义学研讨会. 2002: 59-76. [2] Wu Z, Palmer M. Verb semantics and lexical selection[C]//Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 1994: 133-138. [3] Rada R, Mili H, Bieknell E, et al. Development and application of a metric on semantic nets[J]. IEEE Transactions on Systems, Man and Cybernetics, 1989, 19(1):17-30. [4] Leacock C, Chodorow M.Combining Local Context and WordNet Similarity for Word Sense Identification[J]. An Electronic Lexical Database.1998:265-283. [5] 李峰,刘芳. 中文词语语义相似度计算——基于《知网》2000[J].中文信息学报,2007,21(3): 99-105. [6] 刘青磊,顾小丰.基于《知网》的词语相似度算法研究[J].中文信息学报,2010,24(6):31-37. [7] 张亮,伊存燕,陈家郡.基于语义树的中文词语相似度计算与分析[J].中文信息学报,2011,24(6):23-29.[8] 梅立军,周强,臧路,等.知网与同义词词林的信息融合研究[J].中文信息学报,2005,1(19):63-70. [9] 田久乐,赵蔚.基于同义词词林的词语相似度计算方法[J].吉林大学学报,2010, 28(6): 602-608. [10] Ricardo, Berthier. Modern Information Retrieval [M]. ACM Press/Addison - Wesley,1999. [11] Lin D. An Information-Theoretic Definition of Similarity Semantic Distance in WordNet[C]//Proceedings of the Fifteenth International Conference on Machine Leaning. San Francisco, USA: Morgan Kaufmann Publishers Inc. 1998:296-304. [12] 黄曾阳. HNC(概念层次网络)理论—计算机理解语言研究的新思路[M].北京:清华大学出版社,1998:11-43. [13] 张运良,张全.基于HNC理论的语义相关度计算方法[J].计算机工程与应用, 2005,41(34):14-18. [14] 晋耀红. HNC(概念层次网络)语言理解技术及其应用[M].北京:科学出版社, 2006:50-55. [15] 宋培彦. 基于语义网络的中文词汇链构造方法[J].图书情报工作.2011,55(22): 26-29. [16] 史燕. 基于HNC的汉语句子相似度算法的研究[D].江苏:江苏大学硕士学位论文,2009. [17] 许嘉璐. 现状和设想——试论中文信息处理与现代汉语研究[J].中国语文, 2000,(6):490-496. [18] 何婷婷.语料库研究[D]. 武汉:华中师范大学博士学位论文,2003. [19] 赵巾帼.基于语义距离的概念语义相似度研究[D].湖南:中南大学硕士学位论文,2008.