|
|
Enhanced Data Augmentation Improves Generalisation of Automated Short Answer Scoring |
CHEN Shuang, LI Li |
College of Computer and Information Science, Southwest University, Chongqing 400715, China |
|
|
Abstract The automated short answer grading(ASAG) system reduces the time-consuming manual scoring for educators with Natural Language Processing technology. It is worth noting that most ASAG system has shortcomings that students have intentionally fraud the model to get high scores by copying or slightly rewriting the standard solution. This paper explored a rule-based data augmentation approach to investigate the robustness of the ASAG system. However, natural languages have a discrete factor that limits the diversity of samples synthesized by rule-based data augmentation. In this paper, a knowledge distillation-based data augmentation strategy is proposed to stack different individual data augmentation methods in a parallel manner. In addition, the paper proposes a supervised contrast learning-based ASAG system that enables the model to learn effective sentence representations. We evaluate our model on two datasets from the University of North Texas and SemEval-2013. The performances our model are substantially improved compared to the baselines.
|
Received: 21 March 2022
|
|
|
|
|
[1] LI Z, TOMAR Y, PASSONNEAU R J. A semantic feature-wise transformation relation network for automatic short answer grading[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2021: 6030-6040. [2] Walker H M. Academic honesty in the classroom[J]. ACM SIGCSE Bulleti, 2004, 36(4): 18-19. [3] MURDOCK T B, ANDERMAN E M. Motivational Perspectives on student cheating: toward an integrated model of academic dishonesty[J]. Educational Psychologist, 2006, 41(3): 129-145. [4] WEBER WULFF D. False feathers: A perspective on academic plagiarism[M]. Springer Science and Business, 2014. [5] RUDZEWITZ B.Exploring the intersection of short answer assessment, authorship attribution, and plagiarism detection[C]//Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, 2016: 235-241. [6] JIN D, JIN Z, ZHOU J T, et al. Is BERT really robust?: A strong baseline for natural language attack on text classification and entailment[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(05): 8018-8025. [7] POTTHAST M, HAGEN M, GOLLUB T, et al. Overview of the 5th international competition on plagiarism detection[C]//Proceedings of the CLEF Conference on Multilingual and Multimodal Information Access Evaluation. CELCT, 2013: 301-331. [8] KRAUS C. Plagiarism detection-state-of-the-art systems and evaluation methods[J]. arXiv preprint arXiv: 1603.03014, 2016. [9] LI B, ZHOU H, HE J, et al. On the sentence embeddings from pre-trained language models[J]. arXiv preprint arXiv: 2011.05864, 2020. [10] MOHLER M, MIHALCEA R. Text-to-text semantic similarity for automatic short answer grading[C]//Proceedings of the 12th Conference of the European Chapter of the ACL, 2009: 567-575. [11] SAHU A, BHOWMICK P K. Feature engineering and ensemble-based approach for improving automatic short-answer grading performance[J]. IEEE Transactions on Learning Technologies, 2019, 13(1): 77-90. [12] KUMAR S, CHAKRABARTI S, ROY S. Earth mover's distance pooling over Siamese LSTMs for automatic short answer grading[C]//Proceedings of the IJCAI, 2017: 2046-2052. [13] SAHA S, DHAMECHA T I, MARVANIYA S, et al. Sentence level or token level features for automatic short answer grading?: Use both[C]//Proceedings of the International Conference on Artificial Intelligence in Education. Springer, Cham, 2018: 503-517. [14] SUNG C, DHAMECHA T, SAHA S, et al. Pre-training BERT on domain resources for short answer grading[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 6071-6075. [15] Filighera A, Steuer T, Rensing C. Fooling automatic short answer grading systems. artificial intelligence in education. lecture notes in computer science, vol 12163.[OL]https://doi.org/10.1007/978-3-030-52237-7_15. [16] SENNRICH R, HADDOW B, BIRCH A. Improving neural machine translation models with monolingual data[J]. arXiv preprint arXiv: 1511.06709, 2015. [17] GUO H, MAO Y, ZHANG R. Augmenting data with mixup for sentence classification: An empirical study[J]. arXiv preprint arXiv: 1905.08941, 2019. [18] WU X, LV S, ZANG L, et al. Conditional BERT contextual augmentation[C]//Proceedings of the International Conference on Computational Science. Springer, Cham, 2019: 84-95. [19] QU Y, SHEN D, SHEN Y, et al. CODA: Contrast-enhanced and diversity-promoting data augmentation for natural language understanding[J]. arXiv preprint arXiv: 2010.08670, 2020. [20] KHOSLA P, TETERWAK P, WANG C, et al. Supervised contrastive learning[J]. Advances in Neural Information Processing Systems, 2020, 33: 18661-18673. [21] HE K, FAN H, WU Y, et al. Momentum contrast for unsupervised visual representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 9729-9738. [22] DZIKOVSKA M O, NIELSEN R, BREW C. Towards effective tutorial feedback for explanation questions: A dataset and baselines[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2012: 200-210. [23] MOHLER M, BUNESCU R, MIHALCEA R. Learning to grade short answer questions using semantic similarity measures and dependency graph alignments[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011: 752-762.[24] DZIKOVSKA M O, NIELSEN R D, BREW C, et al. Semeval-2013 task 7: The joint student response analysis and 8th recognizing textual entailment challenge[R]. North Texas State Univ Denton, 2013. [25] DZIKOVSKA M O, ISARD A, BELL P, et al. BEETLE II: An adaptable tutorial dialogue system[C]//Proceedings of the SIGDIAL Conference, 2011: 338-340. [26] FOLTY'NEK T, MEUSCHKE N, GIPP B. Academic plagiarism detection: A systematic literature review[J]. ACM Computing Surveys, 2019, 52(6): 1-42. |
|
|
|