YE Zhonglin1,3,4 , ZHAO Haixing1,2,3,4, ZHANG Ke2,3,4, ZHU Yu2,3,4
1.College of Computer Science, Shaanxi Normal University, Xi’an, Shaanxi 710062, China; 2.College of Computer, Qinghai Normal University, Xining, Qinghai 810008, China; 3.Provincial Laboratory of Tibetan Information Processing and Machine Translation, Qinghai Normal University, Xining, Qinghai 810008, China; 4.Key Laboratory of Tibetan Information Processing, Ministry of Education, Qinghai Normal University, Xining, Qinghai 810008, China
Abstract:Words, as the basic semantic unit in language models, are strongly related to the context words in the whole semantic space. Word representation learning aims at mapping the relationship between words and context words into a low dimensional vector space using the shallow neural network models. However, the existing word representation learning methods usually only consider the syntagmatic relations between words, without directly capturing the paradigmatic information. In this paper, a new word representation learning algorithm, DEWE, is proposed to integrate the semantic information of the word itself into the training of word representation. The structural and semantic generalization of the proposed word representation learning method is validated by 6 similarity evaluation datasets, with all results confirming the excellent performance of DEWE.
[1] Mikolov T,Sutskever I,Chen K,et al.Distributed Representations of Words and Phrases and Their Compositionality[C]//Proceedings of Advances in Neural Information Processing Systems 26,arXiv:1310.4546. [2] Mikolov T,Chen K,Chen,Corrado G S,et al.Efficient Estimation of Word Representations in Vector Space[C]//Proceedings of the 2013 International Conference on Learning Representations,arXiv:1301.3781. [3] Bengio Y,Ducharme R,Vincent P,et al..A Neural Probabilistic Language Model[J].Journal of Machine Learning Research,2000,3(6):932-938. [4] Uchida J,Nara R,Miyaoka Y,et al.A Fast Elliptic Curve Cryptosystem LSI Embedding Word-based Montgomery Multiplier[J].Ieice Transactions on Electronics,2006,E89-C(3):5-10. [5] Levy O,Goldberg Y,Dagan I.Improving Distributional Similarity with Lessons Learned from Word Embeddings[J].Bulletin De La Société Botanique De France,2015,75(3):552-555. [6] Hamilton W L,Clark K,Leskovec J,et al.Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,2016:595-605. [7] Hamilton W L,Leskovec J,Dan J.Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,2016:1489-1501. [8] Levy O,Goldberg Y.Neural Word Embedding as Implicit Matrix Factorization[J].Advances in Neural Information Processing Systems,2014,3:2177-2185. [9] Liu Y,Liu Z y,Chua T S,et al.Topical word em-beddings[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence,2015:2418-2424. [10] Levy O,Goldberg Y.Dependency-Based Word Embeddings[C]//Proceedings of the Meeting of the Association for Computational Linguistics.2014:302-308. [11] Huang E H,Socher R,Manning C D,et al.Improving Word Representations via Global Context and Multiple Word Prototypes[C]//Proceedings of the Meeting of the Association for Computational Linguistics:Long Papers.Association for Computational Linguistics,2012:873-882. [12] Mnih,A.,Teh,Y.W.A Fast and Simple Algorithm for Training Neural Probabilistic Language Models[C]//Proceedings of the 29th International Conference on Machine Learning (ICML’12),1751-1758. [13] Church K W,Hanks P.Word Association Norms,Mutual Information,and Lexicography[J].Compu-tational Linguistics,1990,16(1):22-29. [14] Dagan I,Pereira F,Lee L.Similarity-based Estimation of Word Cooccurrence Probabilities[C]//Proceedings of the Meeting on Association for Computational Linguistics.Association for Computational Linguistics,1994:272-278. [15] Turney, Peter D,Pantel P.From Frequency to Meaning:Vector Space Models of Semantics[J].Journal of Artificial Intelligence Research,2010,37(1):141-188. [16] Natarajan N,Dhillon I S.Inductive Matrix Completion for Predicting Gene-Disease Associations[J].Bioinformatics,2014,30(12):i60-i68. [17] Bruni E,Boleda G,Baroni M,et al.Distributional Semantics in Technicolor[C]//Proceedings of the Meeting on Association for Computational Linguistics.2012:136-145. [18] Luong M T,Socher R,Manning C D.Better Word Representations with Recursive Neural Networks for Morphology[C]//Proceedings of the Seventeenth Conference on Computational Natural Language Learning,2013:104-113. [19] Radinsky K,Agichtein E,Gabrilovich E,et al.A Word at A Time:Computing Word Relatedness using Temporal Semantic Analysis[C]//Proceedings of International Conference on World Wide Web.ACM,2011:337-346. [20] Harris Z.Distributional structure[J].Word,1954,10(23):146-162. [21] Finkelstein L,Gabrilovich E,Matias Y,et al.Placing Search in Context:The Concept Revisited[C]//Proceedings of ACM Transactions on Information Systems,2002,20(1):116-131. [22] Pennington J,Socher R,Manning C.Glove:Global Vectors for Word Representation[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing.2014:1532-1543.