Review
WANG Shi1, CAO Cungen1, PEI Yajun3, XIA Fei1,2
2013, 27(1): 7-15.
The word similarity measure plays a basic role in many NLP related applications. In this paper, we propose a novel and practical method for this purpose with acceptable precision. Guided by the classic distribution hypothesis that “similar words occur in similar contexts”, we suggest the collocations in two-word noun phrases can serve as better contexts than the adjacent words because the former are more semantic related. By using automatic built large-scale noun phrases, we firstly construct tf-idf weighted words vectors containing direct and indirect collocations, and then take their cosine distances as desired semantic similarities. In order to compare with related approaches, we manually design a benchmark test set. On the benchmark test set, the proposed method achieves the correlation coefficients of 0.703, 0.509, and 0.700 on nouns, verbs, and adjectives, respectively, at a coverage 100%.
Key wordssemantic similarity, word collocation, similarity benchmark set