苏玉兰,洪宇,朱鸿雨,武恺莉,张民. 面向问题生成的预训练模型适应性优化方法研究[J]. 中文信息学报, 2022, 36(3): 91-100.
SU Yulan, HONG Yu, ZHU Hongyu, WU Kaili, ZHANG Min. Adaptive Optimization Method of Pre-trained Language Model for Question Generation. , 2022, 36(3): 91-100.
面向问题生成的预训练模型适应性优化方法研究
苏玉兰,洪宇,朱鸿雨,武恺莉,张民
苏州大学 计算机科学与技术学院,江苏 苏州 215006
Adaptive Optimization Method of Pre-trained Language Model for Question Generation
SU Yulan, HONG Yu, ZHU Hongyu, WU Kaili, ZHANG Min
School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China
Abstract:Automatically question generation (QG for short) is to automatically generate the corresponding interrogative sentence of the target answer under the given context. . In this paper, we take advantage of pre-trained language model and apply the UNILM on encoder-decoder framework of question generation. In particular, in order to solve the problems of "exposure bias" and "mask heterogeneity" in the decoding phase of model, we examine the noise-aware training method and transfer learning on UNILM to raise its adaptability Experiments on SQuAD show that our best model yields state-of-the-art performance in answer-aware QG task with up to 20.31% and 21.95% BLEU score for split1 and split2, respectively, and in answer-agnostic QG task with 17.90% BLEU score for split1.
[1] Rajpurkar P, Zhang J, Lopyrev K, et al. Squad: 100,000+ questions for machine comprehension of text[J]. arXiv preprint arXiv: 1606.05250, 2016. [2] Cho K, Van Merrinboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv preprint arXiv: 1406.1078, 2014. [3] Gu J, Lu Z, Li H, et al. Incorporating copying mechanism in sequence-to-sequence learning[J]. arXiv preprint arXiv: 1603.06393, 2016. [4] Gulcehre C, Ahn S, Nallapati R, et al. Pointing the unknown words[J]. arXiv preprint arXiv: 1603.08148, 2016. [5] Bao H, Dong L, Wei F, et al. Unilmv2: Pseudo-masked language models for unified language model pre-training[J]. arXiv preprint arXiv: 2002.12804, 2020. [6] Chan Y H, Fan Y C. A recurrent BERT-based model for question generation[C]//Proceedings of the 2nd Workshop on Machine Reading for Question Answering, 2019: 154-162. [7] Ranzato M A, Chopra S, Auli M, et al. Sequence level training with recurrent neural networks[J]. arXiv preprint arXiv: 1511.06732, 2015. [8] Du X, Shao J, Cardie C. Learning to ask: Neural question generation for reading comprehension[J]. arXiv preprint arXiv: 1705.00106, 2017. [9] Scialom T, Piwowarski B, Staiano J. Self-attention architectures for answer-agnostic neural question generation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 6027-6032. [10] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 5998-6008. [11] Zhou Q, Yang N, Wei F, et al. Neural question generation from text: A preliminary study[C]//Proceedings of the National CCF Conference on Natural Language Processing and Chinese Computing. Springer, Cham, 2017: 662-671. [12] Dong X, Hong Y, Chen X, et al. Neural question generation with semantics of question type[C]//Proceedings of the CCF International Conference on Natural Language Processing and Chinese Computing. Springer, Cham, 2018: 213-223. [13] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv: 1810.04805, 2018. [14] Yang Z, Dai Z, Yang Y, et al. Xlnet: Generalized autoregressive pretraining for language understanding[C]//Proceedings of the Advances in Neural Information Processing Systems, 2019: 5753-5763. [15] Song K, Tan X, Qin T, et al. Mass: Masked sequence to sequence pre-training for language generation[J]. arXiv preprint arXiv: 1905.02450, 2019. [16] Dong L, Yang N, Wang W, et al. Unified language model pre-training for natural language understanding and generation[C]//Proceedings of the 33rd Conference on Neural Information Processing Systems, 2019: 13063-13075. [17] Xiao D, Zhang H, Li Y, et al. ERNIE-GEN: An Enhanced Multi-Flow pre-training and fine-tuning framework for natural language generation[J]. arXiv preprint arXiv: 2001.11314, 2020. [18] Wang M, Smith N A, Mitamura T. What is the Jeopardy model? A quasi-synchronous grammar for QA[C]//Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007: 22-32. [19] Yang Y, Yih W, Meek C. Wikiqa: A challenge dataset for open-domain question answering[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2015: 2013-2018. [20] Garg S, Vu T, Moschitti A. Tanda: Transfer and adapt pre-trained transformer models for answer sentence selection[J]. arXiv preprint arXiv: 1911.04118, 2019. [21] Cohen D, Yang L, Croft W B. Wikipassageqa: A benchmark collection for research on non-factoid answer passage retrieval[C]//Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, 2018: 1165-1168. [22] Trischler A, Wang T, Yuan X, et al. Newsqa: A machine comprehension dataset[J]. arXiv preprint arXiv: 1611.09830, 2016. [23] Joshi M, Choi E, Weld D S, et al. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension[J]. arXiv preprint arXiv: 1705.03551, 2017. [24] Kwiatkowski T, Palomaki J, Redfield O, et al. Natural questions: a benchmark for question answering research[J]. Transactions of the Association for Computational Linguistics, 2019, 7: 453-466. [25] Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002: 311-318. [26] Denkowski M, Lavie A. Meteor universal: Language specific translation evaluation for any target language[C]//Proceedings of the 9th Workshop on Statistical Machine Translation, 2014: 376-380. [27] Lin C Y. Rouge: A package for automatic evaluation of summaries[C]//Proceedings of the Text Summarization Branches Out, 2004: 74-81. [28] Zhao Y, Ni X, Ding Y, et al. Paragraph-level neural question generation with maxout pointer and gated self-attention networks[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 3901-3910.