Article
JIANG Zhenchao; LI Lishuang; HUANG Degen
2017, 31(3): 25-31.
In natural language processing tasks, distributed word representation has succeeded in capturingsemantic regularities and have been used as extra features. However, most word representation model arebased shallow context-window, which are not enough to express the meaning of words. The essence of wordmeaning lies in the word relations, which consist of three elements: relation type, relation direction and relateditems. In this paper, we leverage a large set of unlabeled texts, to make explicit the semantic regularity toemerge in word relations, including dependency relations and context relations, and put forward a novelarchitecture for computing continuous vector representation. We define three different top layers in the neuralnetwork architecture as corresponding to relation type, relation direction and related words, respectively.Different from other models, the relation model can use the deep syntactic information to train wordrepresentations. Tested in word analogy task and Protein-Protein Interaction Extraction task, the results showthat relation model performs overall better than others to capture semantic regularities.