释义生成任务是指为一个目标词生成相应的释义。该文在中文释义生成任务中使用了目标词的上下文信息,并提出了一个基于BERT与柱搜索的释义生成模型。该文构建了包含上下文的CWN中文数据集,同时也在Oxford英文数据集上开展了实验。实验结果显示,该文模型在中英文数据集上性能均有显著提升,其中CWN数据集实验结果相比基线模型BLEU指标提升了10.47,语义相似度指标提升了0.105。语义相似度指标与人工评价结果相关性更高。最后,该文分析了中文释义生成任务仍存在的四个问题。
Abstract
Definition modeling task refers to generate a corresponding definition for a target word. This paper introduces the context information of the target word and proposes a definition generation model based on BERT and beam search. A CWN Chinese definition modeling dataset is constructed with context of the target word. Experiments on this Chinese dataset and the English Oxford dataset show that the model achieves significant improvements in both dataset. Especially in CWN dataset, compared with the baseline model, the BLEU score is improved by 10.47, and the semantic similarity is improved by 0.105.
关键词
中文释义生成 /
BERT /
柱搜索
{{custom_keyword}} /
Key words
Chinese definition modeling /
BERT /
beam search
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Noraset T, Liang C, Birnbaum L, et al. Definition modeling: Learning to define word embeddings in natural language[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2017.
[2] Gadetsky A, Yakubovskiy I, Vetrov D. Conditional generators of words definitions[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 266-271.
[3] Ishiwatari S, Hayashi H, Yoshinaga N, et al. Learning to describe unknown phrases with local and global contexts[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 3467-3476.
[4] Yang L, Kong C, ChenY,et al. Incorporating sememes into Chinese definition modeling[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1669-1677.
[5] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4171- 4186.
[6] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
[7] Li S, Zhao Z, Hu R, et al. Analogical reasoning on Chinese morphological and semantic relations[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 138-143.
[8] Bojanowski P, Grave E,Joulin A, et al. Enriching word vectors with subword information[C]//Proceedings of Transactions of the Association for Computational Linguistics, 2017: 135-146.
[9] Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv: 1412.6980, 2014.
[10] Wolf T, Debut L,Sanh V, et al. HuggingFace‘s Transformers: state-of-the-art natural language processing[J]. arXiv preprint arXiv: 1910.03771, 2019.
[11] Papineni K, Roukos S, Ward T, et al. BLEU: A method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002: 311-318.
[12] Reimers N, Gurevych I. Sentence-BERT: Sentence embeddings using Siamese BERT- Networks[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2019.
[13] Reimers N, Gurevych I. Making monolingual sentence embeddings multilingual using knowledge distillation[J]. arXiv preprint arXiv: 2004.09813, 2020.
[14] Li J,Bao Y, Huang S, et al. Explicit semantic decomposition for definition generation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 708-717.
[15] Cohen E, Beck C. Empirical analysis of beam search performance degradation in neural sequence models[C]//Proceedings of the International Conference on Machine Learning, 2019: 1290-1299.
[16] Likert R. A technique for the measurement of attitudes[M]. New York: The Science Press, 1932.
[17] Mickus T, Paperno D, Constant M. Mark my word: A sequence-to-sequence approach to definition modeling[J]. arXiv preprint arXiv: 1911.05715, 2019.
[18] Chang T Y, Chi T C, Tsai S C,et al. xSense: Learning sense-separated sparse representations and textual definitions for explainable word sense networks[J]. arXiv preprint arXiv: 1809.03348, 2018.
[19] Chang T Y, Chen Y N. What does this word mean? Explaining contextualized embeddings with natural language definition[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 6066-6072.
[20] Mikolov T, Chen K, Corrado G,et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv: 1301.3781, 2013.
[21] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[J]. arXiv preprint arXiv: 1310.4546, 2013.
[22] Peters M E, Neumann M,Iyyer M, et al. Deep contextualized word representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 2227-2237.
[23] Radford A,Narasimhan K, Salimans T, et al. Improving Language Understanding by Generative Pre-training[OL]. https://s3-us-west-2.amazonaws.com[2019-08-16].
[24] Adhikari A, Ram A, Tang R, et al. DocBert: Bert for document classification[J]. arXiv preprint arXiv: 1904.08398, 2019.
[25] Kaneko M,Komachi M. Multi-head multi-layer attention to deep language representations for grammatical error detection[J]. arXiv preprint arXiv: 1904.07334, 2019.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
北京语言大学研究生创新基金(中央高校基本科研业务费专项资金)(20YCX139);北京语言大学语言资源高精尖创新中心项目(TYZ19005);国家语委信息化项目(ZDI135-105)
{{custom_fund}}