基于数据增强的高考阅读理解自动答题研究

张虎,张颖,杨陟卓,钱揖丽,李茹

PDF(2557 KB)
PDF(2557 KB)
中文信息学报 ›› 2021, Vol. 35 ›› Issue (9) : 132-140.
自然语言理解与生成

基于数据增强的高考阅读理解自动答题研究

  • 张虎1,张颖1,杨陟卓1,钱揖丽1,李茹1,2
作者信息 +

Data Augmentation Based Automatic Answering of Reading Comprehension in College Entrance Examination

  • ZHANG Hu1, ZHANG Ying1, YANG Zhizhuo1, QIAN Yili1, LI Ru1,2
Author information +
History +

摘要

机器阅读理解是自然语言处理领域中的一项重要研究任务,高考阅读理解自动答题是近年来阅读理解任务中的又一挑战。目前高考语文阅读理解任务中真题和模拟题的数量相对较少,基于深度学习的方法受到实验数据规模较小的限制,所得的实验结果相比传统方法无明显优势。基于此,该文探索了面向高考语文阅读理解的数据增强方法,结合传统的EDA数据增强思路提出了适应于高考阅读理解的EDA策略,针对阅读材料普遍较长的特征提出了基于滑动窗口的材料动态裁剪方式,围绕材料中不同句子的重要性差异明显的问题,提出了基于相似度计算的材料句质量评价方法。实验结果表明,三种方法均能提升高考题阅读理解自动答题的效果,答题准确率最高可提升5个百分点以上。

Abstract

Automatic answering of reading comprehension in college entrance examination is a challenge in the machine reading comprehension task. At present, the number available question-answering pairs in Chinese reading comprehension of the college entrance examination is limited, and deep learning method is obstructed by the small scale of the experimental data. This paper propose to adapt the traditional EDA data enhancement is to the reading comprehension in college entrance examination. To deal with the long contexts in reading materials, a dynamic material clipping method based on sliding window is proposed. And a method for evaluating the quality of sentences in the reading material is designed on similarity calculation. The experimental results show that all three strategies can improve the automatic answering in reading comprehension of college entrance examination questions, with 5% or more increase in accuracy.

关键词

阅读理解 / 高考题 / 数据增强 / 深度学习

Key words

reading comprehension / college entrance examination questions / data augmentation / deep learning

引用本文

导出引用
张虎,张颖,杨陟卓,钱揖丽,李茹. 基于数据增强的高考阅读理解自动答题研究. 中文信息学报. 2021, 35(9): 132-140
ZHANG Hu, ZHANG Ying, YANG Zhizhuo, QIAN Yili, LI Ru. Data Augmentation Based Automatic Answering of Reading Comprehension in College Entrance Examination. Journal of Chinese Information Processing. 2021, 35(9): 132-140

参考文献

[1] Wei J, Zou K. Eda: Easy data augmentation techniques for boosting performance on text classification tasks[J]. arXiv preprint arXiv:1901.11196, 2019.
[2] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25: 1097-1105.
[3] Cubuk E D, Zoph B, Mane D, et al. Autoaugment: Learning augmentation policies from data[J]. arXiv preprint arXiv:1805.09501, 2018.
[4] Cui X, Goel V, Kingsbury B. Data augmentation for deep neural network acousticmodeling[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(9): 1469-1477.
[5] Ko T, Peddinti V, Povey D, et al. Audio augmentation for speech recognition[C]//Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2015:3586-3589.
[6] Kobayashi S. Contextual augmentation: Data augmentation by words with paradigmaticrelations[J]. arXiv preprint arXiv:1805.06201, 2018.
[7] Wu X,Lv S, Zang L, et al. Conditional BERT contextual augmentation[C]//Proceedings of the International Conference on Computational Science. Springer, Cham, 2019: 84-95.
[8] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for languageunderstanding[J]. arXiv preprint arXiv:1810.04805, 2018.
[9] Zhu H, Dong L, Wei F, et al. Learning to ask unanswerable questions for machine reading comprehension[J]. arXiv preprint arXiv:1906.06045, 2019.
[10] Rajpurkar P, Jia R, Liang P. Know what you don't know: Unanswerable questions for SQuAD[J]. arXiv preprint arXiv:1806.03822, 2018.
[11] Yu A W, Dohan D, Luong M T, et al.Qanet: Combining local convolution with global self-attention for reading comprehension[J]. arXiv preprint arXiv:1804.09541, 2018.
[12] Artetxe M, Labaka G, Agirre E, et al. Unsupervised neural machine translation[J]. arXiv preprint arXiv:1710.11041, 2017.
[13] Iyyer M, Manjunatha V, Boyd Graber J, et al. Deep unordered composition rivals syntactic methods for text classification[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 2015: 1681-1691.
[14] Lample G, Conneau A, Denoyer L, et al. Unsupervised machine translation using monolingual corpora only[J]. arXiv preprint arXiv:1711.00043, 2017.
[15] Xie Z, Wang S I, Li J, et al. Data noising as smoothing in neural network language models[J]. arXiv preprint arXiv:1703.02573, 2017.
[16] Fadaee M, Bisazza A, Monz C. Data augmentation for low-resource neural machine translation[J]. arXiv preprint arXiv:1705.00440, 2017.
[17] Gao F, Zhu J, Wu L, et al.Soft contextual data augmentation for neural machine translation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 5539-5544.
[18] Bergmanis T, Goldwater S. Data augmentation for context-sensitive neural lemmatization using inflection tables and raw text[J]. arXiv preprint arXiv:1904.01464, 2019.
[19] Han S, Gao J, Ciravegna F. Data augmentation for rumor detection using context-sensitive neural language model with large-scale credibility corpus[C]//Proceedings of the 7th International Conference on Learning Representations LLD Workshop, 2019.
[20] Xie Q, Dai Z, Hovy E, et al. Unsupervised data augmentation for consistency training[J]. arXiv preprint arXiv:1904.12848, 2019.
[21] Cui Y, Che W, Liu T, et al. Pre-training with whole word masking for Chinese BERT[J]. arXiv preprint arXiv:1906.08101, 2019.

基金

国家重点基础研究发展计划(2018YFB1005103-3);国家自然科学基金(61806117);山西省自然科学基金(201901D111028)
PDF(2557 KB)

1541

Accesses

0

Citation

Detail

段落导航
相关文章

/