1.MOE Key Laboratory of Computational Linguistics, Peking University, Beijing 100871, China; 2.Institute of Intelligent Information Processing, Beijing Information Science and Technology University, Beijing 100101, China
Question generation (QG) aims to automatically generate fluent and semantically related questions for a given text. QG can be applied to generate questions for reading comprehension tests in the education field, and to enhance question answering and dialog systems. This paper presents a comprehensive survey of related researches on QG. We first describe the significance of QG and its applications, especially in the education field. Then we outline the traditional rule-based methods on QG, and make a detailed description on the neural network based models from different views. We also introduce the evaluation metrics of generated questions. Finally, we discuss the limitations of previous studies and suggest future works.
WU Yunfang, ZHANG Yangsen.
A Survey of Question Generation. Journal of Chinese Information Processing. 2021, 35(7): 1-9
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Nan Duan, Duyu Tang. Overview of the NLPCC-2017 shared task: Open domain Chinese question answering[C]//Proceedings of the CCF International Conference on Natural Language Processing and Chinese Computing, 2017. [2] Guokun Lai, Qizhe Xie, Hanxiao Liu,et al. Race: Large-scale reading comprehension dataset from examinations[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017: 785-794. [3] Mitkov Ruslan, Ha Le An. Computer-aided generation of multiple-choice tests[C]//Proceedings of the HLTNAACL 03 Workshop on Building Educational Applications using Natural Language Processing, 2003. [4] Michael Heilman, Noah A Smith. Good question! Statistical ranking for question generation [C]//Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010: 609-617. [5] David Lindberg, Fred Popowich, John Nesbit, et al. Generating natural language questions to support learning on-line[C]//Proceedings of the 14th European Workshop on Natural Language Generation, 2013: 105-114. [6] Vasile Rus, Brendan Wyse, Paul Piwek,et al. A detailed account of the first question generation shared task evaluation challenge [J]. Dialogue and Discourse, 2012, 3(2):177-204. [7] Yifan Gao, Lidong Bing, Piji Li, et al. Generating distractors for reading comprehension questions from real examinations[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence,2019: 6423-6430. [8] Xiaorui Zhou, Senlin Luo, Yunfang Wu. Co-attention hierarchical network: Generating coherent long distractors for reading comprehension[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, 2020: 9725-9732. [9] Guanliang Chen, Jie Yang, Claudia Hauff, et al. LearningQ: A Large-scale dataset for educational question generation[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence,2018. [10] Angelica Willis, Glenn Davis, Sherry Ruan, et al. Key phrase extraction for generating educational question-answer pairs[C]//Proceedings of the 6th ACM Conference on Learning@scale, 2019. [11] Duyu Tang, Nan Duan, Tao Qin, et al. Question answering and question generation as dual tasks[EB/OL]. CoRR, abs/1706.02027, 2017. [12] Nan Duan, Duyu Tang,Peng Chen, et al. Question generation for question answering[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017: 866-874. [13] Shiyue Zhang and Mohit Bansal. Addressing semantic drift in question generation for semi-supervised question answering [C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, 2019: 2495-2509. [14] Yansen Wang, Chenyi Liu, Minlie Huang, et al. Learning to ask questions in open domain conversational systems with typed decoders[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018:2193-2203. [15] Aliannejadi Mohammad, Zamani Hamed, Crestani Fabio, et al. Asking clarifying questions in open-domain information-seeking conversations[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019. [16] George A Miller.WordNet: An electronic lexical database[M]. MIT Press, 1998. [17] Dhole K D, Manning C D. Syn-QG: Syntactic and shallow semantic rules for question generation[C]//Proceedings of the 2020 Annual Conference of the Association for Computational Linguistics, 2020. [18] Karin Kipper Schuler. VerbNet: Abroad coverage, comprehensive verb lexicon [D]. Ph D. Thesis, University of Pennsylvania, 2005. [19] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev,et al. Squad: 100,000+ questions for machine comprehension of text[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016. [20] Tri Nguyen, Mir Rosenberg, Xia Song,et al. Ms marco: A human generated machine reading comprehension dataset[J/OL]. ArXiv, preprint arXiv: 1611.09268, 2016. [21] Xinya Du, Junru Shao, Claire Cardie. Learning to ask: Neural question generation for reading comprehension[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017:1342-1352. [22] Qingyu Zhou, Nan Yang, Furu Wei,et al. Neural question generation from text: A preliminary study [C]//Proceedings of the CCF International Conference on Natural Language Processing and Chinese Computing, 2017:662-671. [23] Xingdi Yuan, Tong Wang, Caglar Gulcehre, Aless. Machine comprehension by text-to-text neural question generation [C]//Proceedings of the 2nd Workshop on Representation Learning for NLP, 2017: 15-25. [24] Linfeng Song, Zhiguo Wang, Wael Hamza,et al. Leveraging context information for natural question generation[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, 2018:569-574. [25] Abigail See, Peter J Liu, Christopher D Manning. Get to the point: Summarization with pointer-generator networks[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017:1073-1083. [26] Xingwu Sun, Jing Liu, Yajuan Lyu,et al. Answer-focused and position-aware neural question generation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018: 3930- 3939. [27] 董孝政, 洪宇, 朱芬红, 等. 基于密令位置信息特征的问题生成[J].中文信息学报2019, 33(8): 93-100. [28] Yanghoon Kim, Hwanhee Lee, Joongbo Shin, et al. Improving neural question generation using answer separation[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence,2019:6602-6609. [29] Bang Liu, Mingjun Zhao, Di Niu,et al. Learning to generate questions by learning what not to generate[C]//Proceedings of the Web Conference, 2019:1106-1118. [30] Wenjie Zhou, Minghua Zhang, Yunfang Wu. Multi-task learning with language modeling for question generation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, 2019:3392-3397. [31] Wenjie Zhou, Minghua Zhang, Yunfang Wu. Question-type driven question generation [C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, 2019:6031-6036. [32] Jingjing Li, Yifan Gao, Lidong Bing,et al. Improving question generation with to the point context[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, 2019:3216-3226. [33] Xiyao Ma, Qile Zhu, Yanlin Zhou, et al. Improving question generation with sentence-level semantic matching and answer position inferring[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, 2020. [34] Xin Jia, Wenjie Zhou, Xu Sun, et al. How to ask good questions? Try to leverage paraphrases[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020:6130-6140. [35] Yao Zhao, Xiaochuan Ni, Yuanyuan Ding, et al. Paragraph-level neural question generation with maxout pointer and gated self-attention networks[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018:3901-3910. [36] Preksha Nema, Akash Kumar Mohankumar, Mitesh M Khapra, et al. Let’s ask again: Refine network for automatic question generation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, 2019:3314-3323. [37] Yu Chen, LingfeiWu, Mohammed J Zaki. Natural question generation with reinforcement learning based graph-to-sequence model[C]//Proceedings of the 33rd Conference on Neural Information Processing Systems, 2019. [38] Luu Anh Tuan, Darsh J Shah, Regina Barzilay. Capturing greater context for question generation[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, 2020. [39] Wang W, Wei F, Dong L, et al. MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers[EB/OL]. ArXiv abs/2002.10957, 2020. [40] Bao H, Dong L, Wei F, et al. UniLMv2: Pseudo-masked language models for unied language model pre-training[J/OL]. ArXiv preprint arXiv: 2002.12804. 2020. [41] Yan Y, Qi W, Gong Y, et al. ProphetNet: predicting future n-gram for sequence-to-sequence pre-training[C]//Proceedings of the Association for Computational Linguistics: EMNLP 2020:2401-2410. [42] Xiao D, Zhang H, Li Y, et al. ERNIE-GEN: An enhanced multi-flow pre-training and fine-tuning framework for natural language generation[C]//Proceedings of the 29th International Joint Conference on Artificial Intelligence, 2020:3997-4003. [43] Jinwen Ma, Wenpeng Hu, Bing Liu,et al. Aspect-based question generation[C]//Proceedings of the International Conference on Learning Representations, 2018. [44] Siyuan Wang, Zhongyu Wei, Zhihao Fan,et al.A multi-agent communication framework for question-worthy phrase extraction and question generation[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence,2019:7168-7175. [45] Thomas Scialom, Benjamin Piwowarski, Jacopo Staiano. Self-attention architectures for answer-agnostic neural question generation[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2019:6027-6032. [46] Xinya Du, Claire Cardie. Identifying where to focus in reading comprehension for neural question generation[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017:2067-2073. [47] Sandeep Subramanian, Wang Tong, XingdiYuan, et al. Neural models for key phrase detection and question generation[C]//Proceedings of the Workshop on Machine Reading for Question Answering, 2017:78-88. [48] Vishwajeet Kumar, Nitish Joshi, Arijit Mukherjee,et al. Cross-lingual training for automatic question generation[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2019:4863-4872. [49] Wei He, Kai Liu, Jing Liu,et al. DuReader: A Chinese machine reading comprehension dataset from real-world applications[C]//Proceedings of the Workshop on Machine Reading for Question Answering, 2018: 37-46. [50] Zewen Chi, Li Dong, Furu Wei, et al. Cross-lingual natural language generation via pre-training[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, 2020. [51] Peng Li, Wei Li, Zhengyan He,et al.Dataset and neural recurrent sequence labeling model for open-domain factoid question answering[EB/OL]. arXiv:1607.06275. 2016. [52] Kishore Papineni, Salim Roukos, ToddWard, et al. BLEU: A method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002:311-318. [53] Michael Denkowski, Alon Lavie. Meteor universal: Language specific translation evaluation for any target language[C]//Proceedings of the 9th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 2014: 376- 380. [54] Chin Yew Lin.Rouge: A package for automatic evaluation of summaries[C]//Proceedings of Text Summarization Workshop of the 42th Annual Meeting of the Association for Computational Linguistics, 2004:74-81. [55] Preksha Nema, Mitesh M Khapra. Towards a better metric for evaluating question generation systems[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018. [56] Md Arafat Sultan, Shubham Chandel, Ramón F Astudillo, et al. On the importance of diversity in question generation for QA[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020. [57] Qian Yu, Lidong Bing, Qiong Zhang, et al. Review-based question generation with adaptive instance transfer and augmentation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020:280-290. [58] Pan L, Xie Y, Feng Y, et al. Semantic graphs for generating deep questions[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020. [59] Ko W J, Chen T Y, Huang Y, et al. Inquisitive question generation for high level text comprehension[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020:6544-6555.