张海同,孔存良,杨麟儿,何姗,杜永萍,杨尔弘. 基于门控化上下文感知网络的词语释义生成方法[J]. 中文信息学报, 2020, 34(7): 105-112.
ZHANG Haitong, KONG Cunliang, YANG Liner, HE Shan, DU Yongping, YANG Erhong. Gated Context-Aware Network for Definition Generation. , 2020, 34(7): 105-112.
Gated Context-Aware Network for Definition Generation
ZHANG Haitong1,3, KONG Cunliang2,3,4, YANG Liner2,3,4, HE Shan5, DU Yongping1, YANG Erhong2,3
1.Beijing University of Technology, Faculty of Information Technology, Beijing 100124, China; 2.National Language Resources Monitoring and Research Center (CNLR) Print Media Language Branch, Beijing Language and Culture University, Beijing 100083, China; 3.Beijing Advanced Innovation Center for Language Resources, Beijing Language and Culture University, Beijing 100083, China; 4.School of Information Science, Beijing Language and Culture University, Beijing 100083, China; 5.School of International Studies, Yunnan Normal University, Kunming, Yunnan 650500, China
Abstract:The traditional lexicography was mainly subject to manual compilation, which is inefficient and costs a lot of resources. This paper proposes a gated context-aware network for definition generation. It utilizes GRU to model the definitions of words and generates the textual definition for the target word automatically. The model is based on the encoder-decoder architecture. Firstly, the context of the target word is encoded by bidirectional GRU. Then, different matching strategies are used to interact the target word with context and the context information is incorporated into the target word embedding from two aspects of coarse-grained and fine-grained by the attention mechanism to obtain the meaning of the target word in a specific context. The decoding process based on the contextual and semantic information to generate context-dependent definition of the target word. In addition, the quality of generated definitions is further improved by providing the character level information of target words. The experimental results show that the proposed model improves the perplexity of definition modeling and the BLEU score of definition generation on the English Oxford dictionary dataset by 4.45 and 2.19 respectively, and can generate readable and understandable definitions.
[1] 贺芸, 庄成余. 论英语全球化传播的原因及其影响[J]. 云南师范大学学报,2004,2(6):60-64. [2] 章宜华. 对外汉语学习词典释义问题探讨——国内外二语学习词典的对比研究[J]. 世界汉语教学, 2011, 1: 6-9. [3] Turian J, Ratinov L, Bengio Y. Word representations: A simple and general method for semi-supervised learning[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2010: 384-394. [4] Noraset T, Liang C, Birnbaum L, et al. Definition modeling: Learning to define word embeddings in natural language[C]//Proceedings of 31st AAAI Conference on Artificial Intelligence, 2017. [5] Sutskever I, Martens J, Hinton G E. Generating text with recurrent neural networks[C]//Proceedings of the 28th International Conference on Machine Learning (ICML-11), 2011: 1017-1024. [6] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780. [7] Gadetsky A, Yakubovskiy I, Vetrov D. Conditional generators of words definitions[J]. arXiv preprint arXiv: 1806.10090, 2018. [8] Bartunov S, Kondrashkin D, Osokin A, et al. Breaking sticks and ambiguities with adaptive skip-gram[C]//Proceedings of Artificial Intelligence and Statistics, 2016: 130-138. [9] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv: 1409.0473, 2014. [10] Yang L, Kong C, Chen Y, et al. Incorporating sememes into Chinese definition modeling[J]. arXiv preprint arXiv: 1905.06512, 2019. [11] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks[C]//Proceedings of Advances in Neural Information Processing Systems, 2014: 3104-3112. [12] Cho K, VanMerrinboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv preprint arXiv: 1406.1078, 2014. [13] Hochreiter S, Bengio Y, Frasconi P, et al. Gradient flow in recurrent nets: The difficulty of learning long-term dependencies[M]. A Field Guide to Dynamical Recurrent Networks, NJ: Wiley-IEEE Press, 2001, 237-243. [14] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conferce on Neural Information Processing Systems, 2013: 3111-3119. [15] Collobert R, Weston J, Bottou L, et al. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12(8):2493-2537. [16] Kim Y,Jernite Y, Sontag D, et al. Character-aware neural language models[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016. [17] Papineni K, Roukos S, Ward T, et al. BLEU: A method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2002: 311-318. [18] Merity S, Xiong C, Bradbury J, et al. Pointer sentinel mixture models[J]. arXiv preprint arXiv: 1609.07843, 2016. [19] Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv: 1412.6980, 2014. [20] Yang Z, Yang D, Dyer C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 1480-1489. [21] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 5998-6008.