基于要点匹配的文科主观题通用评分

王士进,巩捷甫,汪意发,宋巍,陈志刚,魏思

PDF(8153 KB)
PDF(8153 KB)
中文信息学报 ›› 2023, Vol. 37 ›› Issue (6) : 165-178.
自然语言处理应用

基于要点匹配的文科主观题通用评分

  • 王士进1,3,4,巩捷甫1,3,汪意发1,3,宋巍2,陈志刚1,3,魏思1,3
作者信息 +

Key Points Matching Based Scoring Method for Liberal Arts Subjective Questions

  • WANG Shijin1,3,4, GONG Jiefu1,3, WANG Yifa1,3, SONG Wei2, CHEN Zhigang1,3, WEI Si1,3
Author information +
History +

摘要

主观题自动评分是智慧教育创新中的重要环节,逐步成为人工智能与教育行业领域交叉的热门方向之一。该文面向文科要点主观题,提出基于多任务学习的要点匹配评价模型: 评估学生作答与标准答案各个要点之间的匹配等级,并抽取其中与要点相对应的具体片段,通过这两个任务的结果同时刻画学生对每个要点的掌握程度,并作为自动评分的关键特征;将要点匹配评价结果与文本相似度特征相结合,实现主观题作答自动评分,在无定标数据的通用评分场景下大幅提升了效果。对比实验证明了相比传统特征,基于要点匹配评价结果的特征在评分模型中更加重要。

Abstract

Automatic scoring of subjective questions has become an important aspect of smart education innovation, gradually becoming one of the hot topics in the intersection of artificial intelligence and the education industry. This paper proposes a key point matching evaluation model based on multi-task learning for liberal arts subjective questions of key points: the model evaluates the matching level between the student's answer and each key point of the standard answer, and extract the specific fragments corresponding to the key points. Through the results of these two tasks, depict the student's mastery of each key point as a key feature of automatic scoring. The key point matching evaluation results are combined with text similarity features to achieve automated scoring of subjective questions, significantly improving the effectiveness in general scoring scenarios without calibration data. Comparative experiments have shown that compared with traditional features, features based on key point matching evaluation results are more important in the scoring model.

关键词

文科主观题 / 作答要点匹配评价 / 多任务训练 / 通用评分

Key words

liberal arts subjective questions / key point matching evaluation / multi-task learning / general scoring

引用本文

导出引用
王士进,巩捷甫,汪意发,宋巍,陈志刚,魏思. 基于要点匹配的文科主观题通用评分. 中文信息学报. 2023, 37(6): 165-178
WANG Shijin, GONG Jiefu, WANG Yifa, SONG Wei, CHEN Zhigang, WEI Si. Key Points Matching Based Scoring Method for Liberal Arts Subjective Questions. Journal of Chinese Information Processing. 2023, 37(6): 165-178

参考文献

[1] 郑庆华,董博,钱步月,等,智慧教育研究现状与发展趋势[J]. 计算机研究与发展,2019.56(1): 209-224.
[2] LEACOCK C, CHODOROW M. C-rater: Automated scoring of short-answer questions[J]. Computers and the Humanities, 2003, 37(4): 389-405.
[3] BACHMAN L F,CARR N, KAMEI G, et al. A reliable approach to automatic assessment of short answer free responses[C]//Proceedings of the 19th International Conference on Computational Linguistics, 2002: 1-4.
[4] HEILMAN M,MADNANI N. ETS: Domain adaptation and stacking for short answer scoring[C]//Proceedings of the 2nd Joint Conference on Lexical and Computational Semantics: Volume 2—Proceedings of the 7th International Workshop on Semantic Evaluation, 2013: 275-279.
[5] SULTAN M A, SALAZAR C, SUMNER T. Fast and easy short answer grading with high accuracy[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 1070-1075.
[6] RIORDAN B,HORBACH A, CAHILL A, et al. Investigating neural architectures for short answer scoring[C]//Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, 2017: 159-168.
[7] KENTON J D M W C, TOUTANOVA L K. Bert: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL-HLT, 2019: 4171-4186.
[8] SIDDIQI R, HARRISON C J, SIDDIQI R. Improving teaching and learning through automated short-answer marking[J]. IEEE Transactions on Learning Technologies, 2010, 3(3): 237-249.
[9] MAKATCHEV M, VANLEHN K. Combining Bayesian networks and formal reasoning for semantic classification of student utterances[C]//Proceedings of the Artificial Intelligence in Education. IOS Press, 2007: 307-314.
[10] RAMACHANDRAN L, CHENG J, FOLTZ P. Identifying patterns for short answer scoring using graph-based lexico-semantic text matching[C]//Proceedings of the 10th Workshop on Innovative Use of NLP for Building Educational Applications, 2015: 97-106.
[11] MOHLER M,BUNESCU R, MIHALCEA R. Learning to grade short answer questions using semantic similarity measures and dependency graph alignments[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011: 752-762.
[12] JIMENEZ S, BECERRA C,GELBUKH A. SOFTCARDINALITY: Hierarchical text overlap for student response analysis[C]//Proceedings of the 2nd Joint Conference on Lexical and Computational Semantics: Volume 2—Proceedings of the 7th International Workshop on Semantic Evaluation, 2013: 280-284.
[13] MARVANIYA S, SAHA S, DHAMECHA T I, et al. Creating scoring rubric from representative student answers for improved short answer grading[C]//Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018: 993-1002.
[14] MAGOODA A E, ZAHRAN M, RASHWAN M, et al. Vector based techniques for short answer grading[C]//Proceedings of the 29th International Flairs Conference, 2016.
[15] SHEN C, SUN C, WANG J, et al. Sentiment classification towards question-answering with hierarchical matching network[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 3654-3663.
[16] UTO M, XIE Y, UENO M. Neural automated essay scoring incorporating handcrafted features[C]//Proceedings of the 28th International Conference on Computational Linguistics, 2020: 6077-6088.
[17] UTO M, UCHIDA Y. Automated short-answer grading using deep neural networks and item response theory[C]//Proceedings of the International Conference on Artificial Intelligence in Education. Springer, Cham, 2020: 334-339.
[18] TAN H, WANG C, DUAN Q, et al. Automatic short answer grading by encoding student responses via a graph convolutional network[J]. Interactive Learning Environments, 2020: 1-15.
[19] LI Z,TOMAR Y, PASSONNEAU R J. A semantic feature-wise transformation relation network for automatic short answer grading[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2021: 6030-6040.
[20] YANG X, HUANG Y, ZHUANG F, et al. Automatic Chinese short answer grading with deep autoencoder[C]//Proceedings of the International Conference on Artificial Intelligence in Education. Springer, Cham, 2018: 399-404.
[21] WANG T, INOUE N,OUCHI H, et al. Inject rubrics into short answer grading system[C]//Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP, 2019: 175-182.
[22] KUMAR S, CHAKRABARTI S, ROY S. Earth mover's distance pooling over siamese LSTMs for automatic short answer grading[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017: 2046-2052.
[23] 谭红叶, 午泽鹏, 卢宇, 等. 基于代表性答案选择与注意力机制的短答案自动评分[J]. 中文信息学报, 2019, 33(11): 134-142.
[24] SUNG C,DHAMECHA T, SAHA S, et al. Pre-training BERT on domain resources for short answer grading[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 6071-6075.
[25] SUNG C,DHAMECHA T I, MUKHI N. Improving short answer grading using transformer-based pre-training[C]//Proceedings of the International Conference on Artificial Intelligence in Education. Springer, Cham, 2019: 469-481.
[26] CAMUS L,FILIGHERA A. Investigating transformers for automatic short answer grading[C]//Proceedings of the International Conference on Artificial Intelligence in Education. Springer, Cham, 2020: 43-48.
[27] LUN J, ZHU J, TANG Y, et al. Multiple data augmentation strategies for improving performance on automatic short answer scoring[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(09): 13389-13396.
[28] ZHANG Y, LIN C, CHI M. Going deeper: Automatic short-answer grading by combining student and question models[J]. User Modeling and User-adapted Interaction, 2020, 30(1): 51-80.
[29] CHEN Z,BADRINARAYANAN V, LEE C Y, et al. Gradnorm: gradient normalization for adaptive loss balancing in deep multitask networks[C]//Proceedings of the International Conference on Machine Learning. PMLR, 2018: 794-803.
[30] GURURANGAN S, MARASOVIC′ A, SWAYAMDIPTA S, et al. Don't stop pretraining: Adapt language models to domains and tasks[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 8342-8360.
[31] LOSHCHILOV I, HUTTER F. Fixing weight decay regularization in Adam, CoRR, abs/1711.05101[C]//Proceedings of the ICLR Conference Blind Submission, Vancouver, BC, Canada, 2017: 30.
[32] 何屹松, 孙媛媛, 张凯, 等. 计算机智能辅助评分系统定标集选取和优化方法研究[J]. 中国考试, 2020, 1: 30-36.

基金

国家重点研究与发展计划项目(2022YFC3303504);国家自然科学基金(61876113)
PDF(8153 KB)

1264

Accesses

0

Citation

Detail

段落导航
相关文章

/