王 石1,曹存根1,裴亚军3,夏 飞1,2. 一种基于搭配的中文词汇语义相似度计算方法[J]. 中文信息学报, 2013, 27(1): 7-15.
WANG Shi1, CAO Cungen1, PEI Yajun3, XIA Fei1,2. A Collocation-based Method for Semantic Similarity Measure for Chinese Words. , 2013, 27(1): 7-15.
A Collocation-based Method for Semantic Similarity Measure for Chinese Words
WANG Shi1, CAO Cungen1, PEI Yajun3, XIA Fei1,2
1. Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; 2. University of Chinese Academy of Sciences, Beijing 100049, China; 3. China National Committee for Terms in Sciences and Technologies, Beijing 100717, China
Abstract:The word similarity measure plays a basic role in many NLP related applications. In this paper, we propose a novel and practical method for this purpose with acceptable precision. Guided by the classic distribution hypothesis that “similar words occur in similar contexts”, we suggest the collocations in two-word noun phrases can serve as better contexts than the adjacent words because the former are more semantic related. By using automatic built large-scale noun phrases, we firstly construct tf-idf weighted words vectors containing direct and indirect collocations, and then take their cosine distances as desired semantic similarities. In order to compare with related approaches, we manually design a benchmark test set. On the benchmark test set, the proposed method achieves the correlation coefficients of 0.703, 0.509, and 0.700 on nouns, verbs, and adjectives, respectively, at a coverage 100%. Key wordssemantic similarity, word collocation, similarity benchmark set
[1] Akira Utsumi, Daisuke Suzuki. Word vectors and two kinds of similarity[C]//Proceedings of the COLING/ACL on Main Conference Poster Sessions. 2006: 858-865. [2] Curran J R. From Distributional to Semantic Similarity[D]. A dissertation submitted to University of Edinburgh for the Degree of Doctor of Philosophy. 2003. [3] Qun Liu, Sujian Li. Word similarity computing based on Hownet[C]//Proceedings of Computational Linguistics and Chinese Language Processing. 2002: 59-76. [4] P Buitelaar, P Cimiano, M Grobelnik. Ontology learning from text[C]//Proceedings of ECML/PKDD. 2005. [5] Ting Liu, Wanxiang Che, Sheng Li. Semantic Role Labeling with Maximum Entropy Classifier [J]. Journal of Software. 2007, 18(3): 565-573. [6] G Miller, W Charles. Contextual correlates of semantic similarity[C]//Proceedings of Language and Cognitive Processes. 1998. [7] Richardson R, Smeaton A F, Murphy J. Using WordNet as a Knowledge Base for Measuring Semantic Similarity Between Words[C]//Proceedings of AICS Conference. 1994. [8] R Rada, H Mili, E Bicknell, et al. Development and application of a metric on semantic nets[C]//Proceedings of IEEE Transactions on Systems Management and Cybernetics. 1989, 19: 17-30. [9] Ted Pedersen, Siddharth Patwardhan. Wordnet: similarity-measuring the relatedness of concepts[C]//Proceedings of the 19th National Conference on Artificial Intelligence. 2004. [10] Leacock C, Chodorow M. Combining local context and WordNet similarity for word sense identification [M]. WordNet: An electronic lexical. 1998: 265-283. [11] Zhibiao Wu, Martha Palmer. Verbs semantics and lexical selection[C]//Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. 2003: 133-138. [12] Yuhua Li, Zuhair A Bandar, David McLean. An approach for measuring semantic similarity between words using multiple information sources [J]. IEEE Transactions on Knowledge and Data Engineering. 2003, 15. [13] DQ Yang, David MW Powers. Measuring semantic similarity in the taxonomy of WordNet[C]//Proceedings of the 28th Australasian Conference on Computer Science. 2005, 102: 315-322. [14] Shi Wang, Cungen Cao, Yanan Cao, et al. Measuring Taxonomic Similarity between Words Using Restrictive Context Matrices[C]//Proceedings of 5th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2008). 2008: 193-197. [15] Ricardo Baeza-Yates, Berthier Ribeiro-Neto. Modern Information Retrieval[M]. s.l. : ACM Press, 1999. [16] Shi Wang, Yanan Cao, Xinyu Cao, et al. Learning Concepts from Text Based on the Inner-Constructive Model. Knowledge Science, Engineering and Management[C]//Proceedings of 2nd International Conference (KSEM 2007). 2007. [17] Herbert Rubenstein, John B Goodenough. Contextual Correlates of Synonymy[C]//Proceedings of ACM. 1987, 8: 1317-1323. [18] George A Miller. WordNet: A Lexical Database for English[C]//Proceedings of Communications of the ACM (CACM). 1995, Vol. 38: 39-41. [19] 王石, 曹存根. WNCT:一种WordNet中概念的自动翻译方法[J]. 中文信息学报,2009,23(4):63-70.