摘要自动简答题评分(Automated short answer grading,ASAG)是利用自然语言处理技术减少教育工作者人工评分负担。值得注意的是,目前大多数ASAG系统存在缺陷,学生通过复制或稍微改写标准答案取得高分的欺骗行为。该文探索一种基于规则的数据增强方法研究ASAG系统的鲁棒性。然而,由于自然语言存在离散性因素,导致基于规则的数据增强合成的样本的多样性受到限制。该文提出基于知识蒸馏的数据增强策略,以并行的方式堆叠不同的单个数据增强方法。此外,该文提出基于监督对比学习的ASAG系统,使得模型能学习到有效的句子表示。该文在University of North Texas和SemEval-2013两个公开数据集上进行了评估,与基线模型相比,该文提出的系统在性能上有实质性提高。
Abstract:The automated short answer grading(ASAG) system reduces the time-consuming manual scoring for educators with Natural Language Processing technology. It is worth noting that most ASAG system has shortcomings that students have intentionally fraud the model to get high scores by copying or slightly rewriting the standard solution. This paper explored a rule-based data augmentation approach to investigate the robustness of the ASAG system. However, natural languages have a discrete factor that limits the diversity of samples synthesized by rule-based data augmentation. In this paper, a knowledge distillation-based data augmentation strategy is proposed to stack different individual data augmentation methods in a parallel manner. In addition, the paper proposes a supervised contrast learning-based ASAG system that enables the model to learn effective sentence representations. We evaluate our model on two datasets from the University of North Texas and SemEval-2013. The performances our model are substantially improved compared to the baselines.
[1] LI Z, TOMAR Y, PASSONNEAU R J. A semantic feature-wise transformation relation network for automatic short answer grading[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2021: 6030-6040. [2] Walker H M. Academic honesty in the classroom[J]. ACM SIGCSE Bulleti, 2004, 36(4): 18-19. [3] MURDOCK T B, ANDERMAN E M. Motivational Perspectives on student cheating: toward an integrated model of academic dishonesty[J]. Educational Psychologist, 2006, 41(3): 129-145. [4] WEBER WULFF D. False feathers: A perspective on academic plagiarism[M]. Springer Science and Business, 2014. [5] RUDZEWITZ B.Exploring the intersection of short answer assessment, authorship attribution, and plagiarism detection[C]//Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, 2016: 235-241. [6] JIN D, JIN Z, ZHOU J T, et al. Is BERT really robust?: A strong baseline for natural language attack on text classification and entailment[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(05): 8018-8025. [7] POTTHAST M, HAGEN M, GOLLUB T, et al. Overview of the 5th international competition on plagiarism detection[C]//Proceedings of the CLEF Conference on Multilingual and Multimodal Information Access Evaluation. CELCT, 2013: 301-331. [8] KRAUS C. Plagiarism detection-state-of-the-art systems and evaluation methods[J]. arXiv preprint arXiv: 1603.03014, 2016. [9] LI B, ZHOU H, HE J, et al. On the sentence embeddings from pre-trained language models[J]. arXiv preprint arXiv: 2011.05864, 2020. [10] MOHLER M, MIHALCEA R. Text-to-text semantic similarity for automatic short answer grading[C]//Proceedings of the 12th Conference of the European Chapter of the ACL, 2009: 567-575. [11] SAHU A, BHOWMICK P K. Feature engineering and ensemble-based approach for improving automatic short-answer grading performance[J]. IEEE Transactions on Learning Technologies, 2019, 13(1): 77-90. [12] KUMAR S, CHAKRABARTI S, ROY S. Earth mover's distance pooling over Siamese LSTMs for automatic short answer grading[C]//Proceedings of the IJCAI, 2017: 2046-2052. [13] SAHA S, DHAMECHA T I, MARVANIYA S, et al. Sentence level or token level features for automatic short answer grading?: Use both[C]//Proceedings of the International Conference on Artificial Intelligence in Education. Springer, Cham, 2018: 503-517. [14] SUNG C, DHAMECHA T, SAHA S, et al. Pre-training BERT on domain resources for short answer grading[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 6071-6075. [15] Filighera A, Steuer T, Rensing C. Fooling automatic short answer grading systems. artificial intelligence in education. lecture notes in computer science, vol 12163.[OL]https://doi.org/10.1007/978-3-030-52237-7_15. [16] SENNRICH R, HADDOW B, BIRCH A. Improving neural machine translation models with monolingual data[J]. arXiv preprint arXiv: 1511.06709, 2015. [17] GUO H, MAO Y, ZHANG R. Augmenting data with mixup for sentence classification: An empirical study[J]. arXiv preprint arXiv: 1905.08941, 2019. [18] WU X, LV S, ZANG L, et al. Conditional BERT contextual augmentation[C]//Proceedings of the International Conference on Computational Science. Springer, Cham, 2019: 84-95. [19] QU Y, SHEN D, SHEN Y, et al. CODA: Contrast-enhanced and diversity-promoting data augmentation for natural language understanding[J]. arXiv preprint arXiv: 2010.08670, 2020. [20] KHOSLA P, TETERWAK P, WANG C, et al. Supervised contrastive learning[J]. Advances in Neural Information Processing Systems, 2020, 33: 18661-18673. [21] HE K, FAN H, WU Y, et al. Momentum contrast for unsupervised visual representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 9729-9738. [22] DZIKOVSKA M O, NIELSEN R, BREW C. Towards effective tutorial feedback for explanation questions: A dataset and baselines[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2012: 200-210. [23] MOHLER M, BUNESCU R, MIHALCEA R. Learning to grade short answer questions using semantic similarity measures and dependency graph alignments[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011: 752-762.[24] DZIKOVSKA M O, NIELSEN R D, BREW C, et al. Semeval-2013 task 7: The joint student response analysis and 8th recognizing textual entailment challenge[R]. North Texas State Univ Denton, 2013. [25] DZIKOVSKA M O, ISARD A, BELL P, et al. BEETLE II: An adaptable tutorial dialogue system[C]//Proceedings of the SIGDIAL Conference, 2011: 338-340. [26] FOLTY'NEK T, MEUSCHKE N, GIPP B. Academic plagiarism detection: A systematic literature review[J]. ACM Computing Surveys, 2019, 52(6): 1-42.