柳杨,吉立新,黄瑞阳,朱宇航,李星. 基于门控卷积机制与层次注意力机制的多语义词向量计算方法[J]. 中文信息学报, 2018, 32(7): 1-10,19.
LIU Yang, JI Lixin, HUANG Ruiyang, ZHU Yuhang, LI Xing. A Multi-sense Word Embedding Method Based on Gated Convolution and Hierarchical Attention Mechanism. , 2018, 32(7): 1-10,19.
基于门控卷积机制与层次注意力机制的多语义词向量计算方法
柳杨,吉立新,黄瑞阳,朱宇航,李星
国家数字交换系统工程技术研究中心,河南 郑州 450002
A Multi-sense Word Embedding Method Based on Gated Convolution and Hierarchical Attention Mechanism
LIU Yang, JI Lixin, HUANG Ruiyang, ZHU Yuhang, LI Xing
National Digital Switching System Engineering and Technological R & D Center, Zhengzhou, Henan 450002, China
Abstract:The existing methods (mapping a word to a single vector) do not consider the problem of polysemy, which may cause the problem of ambiguity; Rather than mapping a word to multiple vectors, this paper proposes a computing method of multi-sense word embedding by: 1) fusing hierarchical attention mechanism with non-residual encapsulated gated convolution mechanism in the sub-sense layer and synthetic sense layer of the words in the selected context window, and 2) obtains the synthetic sense embedding of the target word under the asymmetric window to predict the target word. On small-scale corpus, the proposed multi-sense word embedding achieves at most 1.42% increase in the accuracy of the word analogy task, an average 2.11% (up to 5.47%) improvement in the word similarity tasks including WordSim353, MC, RG, and RW. In addition, this method also significantly improves the performance of the language modeling compared with other methods predicting target words.
[1] Mikolov T, Le Q V, Sutskever I. Exploiting similarities among languages for machine translation [J]. arXiv preprint arXiv: 1309. 4168, 2013.
[2] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space [J]. arXiv preprint arXiv: 1301. 3781, 2013.
[3] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[J]. Advances in neural information processing systems, 2013, 26: 3111-3119.
[4] Mikolov T, Yih W, Zweig G. Linguistic regularities in continuous space word representations[C]//Proceedings of the hlt-Naacl. 2013(13): 746-751.
[5] Pennington J, Socher R, Manning C D. Glove: Global vectors for word representation [C]//Proceedings of the EMNLP, 2014(14): 1532-1543.
[6] Ji S, Yun H, Yanardag P, et al. Wordrank: Learning word embeddings via robust ranking[J]. arXiv preprint arXiv: 1506. 02761, 2015.
[7] Joulin A, Grave E, Bojanowski P, et al. Bag of tricks for efficient text classification[J]. arXiv preprint arXiv: 1607. 01759, 2016.
[8] Bojanowski P, Grave E, Joulin A, et al. Enriching word vectors with subword information[J]. arXiv preprint arXiv: 1607. 04606, 2016.
[9] Vilnis L, McCallum A. Word representations via gaussian embedding[J]. arXiv preprint arXiv: 1412. 6623, 2014.
[10] Chen X, Qiu X, Jiang J, et al. Gaussian mixture embeddings for multiple word prototypes[J]. arXiv preprint arXiv: 1511. 06246, 2015.
[11] Huang E H, Socher R, Manning C D, et al. Improving word representations via global context and multiple word prototypes[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers Volume 1. Association for Computational Linguistics, 2012: 873-882.
[12] Neelakantan A, Shankar J, Passos A, et al. Efficient non-parametric estimation of multiple embeddings per word in vector space[J]. arXiv preprint arXiv: 1504. 06654, 2015.
[13] NiuYilin, Xie Ruobing, Liu Zhiyuan, et al. Improved Word Representation Learning with Sememes[C]//Proeedings of The 55th Annual Meeting of the Association for Computational Linguistics (ACL'17). Long paper. Canada, 2017(to be published).
[14] Dauphin Y N, Fan A, Auli M, et al. Language modeling with gated convolutional networks[J]. arXiv preprint arXiv: 1612. 08083, 2016.
[15] Gehring J, Auli M, Grangier D, et al. Convolutional sequence to sequence learning[J]. arXiv preprint arXiv: 1705. 03122, 2017.
[16] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
[17] Sutskever I, Martens J, Dahl G, et al. On the importance of initialization and momentum in deep learning[C]//Proceedings of the International Conference on Machine Learning, 2013: 1139-1147.
[18] Pascanu R, Mikolov T, Bengio Y. On the difficulty of training recurrent neural networks[C]//Proceedings of International Conference on Machine Learning. 2013: 1310-1318.
[19] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the International Conference on Machine Learning, 2015: 448-456.
[20] Finkelstein L, Gabrilovich E, Matias Y, et al. Placing search in context: The concept revisited[C]//Proceedings of the 10th International Conference on World Wide Web. ACM, 2001: 406-414.
[21] Miller G A, Charles W G. Contextual correlates of semantic similarity[J]. Language and Cognitive Processes, 1991, 6(1): 1-28.
[22] Rubenstein H, Goodenough J B. Contextual correlates of synonymy[J]. Communications of the ACM, 1965, 8(10): 627-633.
[23] Luong T, Socher R, Manning C D. Better word representations with recursive neural networks for morphology[C]//Proceedings of the CoNLL, 2013: 104-113. [24] Marcus M P, Marcinkiewicz M A, Santorini B. Building a large annotated corpus of English: The Penn Treebank[J]. Computational Linguistics, 1993, 19(2): 313-330.
[25] Levy O, Goldberg Y, Ramat-Gan I. Linguistic regularities in sparse and explicit word representations[C]//Proceedings of the CoNLL, 2014: 171-180.