范齐楠,孔存良,杨麟儿,杨尔弘. 基于BERT与柱搜索的中文释义生成[J]. 中文信息学报, 2021, 35(11): 80-90.
FAN Qinan, KONG Cunliang, YANG Liner, YANG Erhong. Chinese Definition Modeling Based on BERT and Beam Search. , 2021, 35(11): 80-90.
Chinese Definition Modeling Based on BERT and Beam Search
FAN Qinan1, KONG Cunliang1, YANG Liner1,2, YANG Erhong2
1.School of Information Science, Beijing Language and Culture University, Beijing 100083, China; 2.Advanced Innovaion Center for Language Resources, Beijing Language and Culture University, Beijing 100083, China
Abstract:Definition modeling task refers to generate a corresponding definition for a target word. This paper introduces the context information of the target word and proposes a definition generation model based on BERT and beam search. A CWN Chinese definition modeling dataset is constructed with context of the target word. Experiments on this Chinese dataset and the English Oxford dataset show that the model achieves significant improvements in both dataset. Especially in CWN dataset, compared with the baseline model, the BLEU score is improved by 10.47, and the semantic similarity is improved by 0.105.
[1] Noraset T, Liang C, Birnbaum L, et al. Definition modeling: Learning to define word embeddings in natural language[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2017. [2] Gadetsky A, Yakubovskiy I, Vetrov D. Conditional generators of words definitions[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 266-271. [3] Ishiwatari S, Hayashi H, Yoshinaga N, et al. Learning to describe unknown phrases with local and global contexts[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 3467-3476. [4] Yang L, Kong C, ChenY,et al. Incorporating sememes into Chinese definition modeling[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1669-1677. [5] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4171- 4186. [6] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010. [7] Li S, Zhao Z, Hu R, et al. Analogical reasoning on Chinese morphological and semantic relations[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 138-143. [8] Bojanowski P, Grave E,Joulin A, et al. Enriching word vectors with subword information[C]//Proceedings of Transactions of the Association for Computational Linguistics, 2017: 135-146. [9] Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv: 1412.6980, 2014. [10] Wolf T, Debut L,Sanh V, et al. HuggingFace‘s Transformers: state-of-the-art natural language processing[J]. arXiv preprint arXiv: 1910.03771, 2019. [11] Papineni K, Roukos S, Ward T, et al. BLEU: A method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002: 311-318. [12] Reimers N, Gurevych I. Sentence-BERT: Sentence embeddings using Siamese BERT- Networks[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2019. [13] Reimers N, Gurevych I. Making monolingual sentence embeddings multilingual using knowledge distillation[J]. arXiv preprint arXiv: 2004.09813, 2020. [14] Li J,Bao Y, Huang S, et al. Explicit semantic decomposition for definition generation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 708-717. [15] Cohen E, Beck C. Empirical analysis of beam search performance degradation in neural sequence models[C]//Proceedings of the International Conference on Machine Learning, 2019: 1290-1299. [16] Likert R. A technique for the measurement of attitudes[M]. New York: The Science Press, 1932. [17] Mickus T, Paperno D, Constant M. Mark my word: A sequence-to-sequence approach to definition modeling[J]. arXiv preprint arXiv: 1911.05715, 2019. [18] Chang T Y, Chi T C, Tsai S C,et al. xSense: Learning sense-separated sparse representations and textual definitions for explainable word sense networks[J]. arXiv preprint arXiv: 1809.03348, 2018. [19] Chang T Y, Chen Y N. What does this word mean? Explaining contextualized embeddings with natural language definition[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 6066-6072. [20] Mikolov T, Chen K, Corrado G,et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv: 1301.3781, 2013. [21] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[J]. arXiv preprint arXiv: 1310.4546, 2013. [22] Peters M E, Neumann M,Iyyer M, et al. Deep contextualized word representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 2227-2237. [23] Radford A,Narasimhan K, Salimans T, et al. Improving Language Understanding by Generative Pre-training[OL]. https://s3-us-west-2.amazonaws.com[2019-08-16]. [24] Adhikari A, Ram A, Tang R, et al. DocBert: Bert for document classification[J]. arXiv preprint arXiv: 1904.08398, 2019. [25] Kaneko M,Komachi M. Multi-head multi-layer attention to deep language representations for grammatical error detection[J]. arXiv preprint arXiv: 1904.07334, 2019.