写作智能评测研究综述和发展前景

薛嗣媛,周建设,任福继

PDF(2698 KB)
PDF(2698 KB)
中文信息学报 ›› 2023, Vol. 37 ›› Issue (2) : 1-14.
综述

写作智能评测研究综述和发展前景

  • 薛嗣媛1,周建设1,任福继1,2
作者信息 +

A Systematic Survey of Automated Essay Scoring

  • XUE Siyuan1, ZHOU Jianshe1, REN Fuji1,2
Author information +
History +

摘要

随着计算机技术的发展,写作智能评测研究有了更加丰富的技术手段和应用场景。该文对写作智能评测的相关研究进行了梳理,首先对写作智能评测系统的发展历程进行了阶段性梳理;其次介绍了写作智能评测研究的任务模式、常用数据库、评估方式;再次梳理了写作智能评测的主要技术方法;再次以面向汉语母语者、面向汉语非母语者两个不同维度展开介绍中文写作智能评测研究的现状及面临的挑战;最后对未来写作智能评测研究发展进行展望。

Abstract

This paper summarizes the researches on automated essay scoring, including the development of automated essay scoring system. It also examines the tasks, public datasets and popular metrics in of automated essay scoring. The main techniques and models for automated essay scoring are reviewed, as well as the challenges in terms of both native Chinese speakers and non-native Chinese speakers.; Finally, the prospects for future automated essay scoring is discussed.

关键词

写话智能评测 / 汉语作文测评 / 语言智能技术 / 中文信息处理

Key words

automated essay scoring / Chinese automated essay scoring / language intelligence technology / Chinese information processing

引用本文

导出引用
薛嗣媛,周建设,任福继. 写作智能评测研究综述和发展前景. 中文信息学报. 2023, 37(2): 1-14
XUE Siyuan, ZHOU Jianshe, REN Fuji. A Systematic Survey of Automated Essay Scoring. Journal of Chinese Information Processing. 2023, 37(2): 1-14

参考文献

[1] PAGE E B. The imminence of grading essays by computer[J]. Phi Delta Kappan, 1966, 47(5): 238-243.
[2] BURROWS S, GUREVYCH I, STEIN B. The eras and trends of automatic short answer grading[J]. International Journal of Artificial Intelligence in Education, 2015, 25: 60-117.
[3] KE Z, NG V. Automated essay scoring: A survey of the state of the art[C]//Proceedings of International Joint Conference on Artificial Intelligence, 2019: 6300-6308.
[4] HUSSEIN A, HASSAN H, NASSEF M. Automated language essay scoring systems: A literature review[J]. Computer Science, 2019, 5: e208.
[5] KLEBANOV B, MADNANI N. Automated evaluation of writing: 50 years and counting[C]//Proceedings of the 58th Annual Meeting of Computational Linguistics,2020: 7796-7810.
[6] BORADE G, NETAK D. Automated grading of essays: A review[C]//Proceedings of the International Conference on Intelligent Human Computer Interaction, 2020: 238-249.
[7] UTO M. A review of deep-neural automated essay scoring models[J]. Behavior Metrika, 2021, 48(2): 459-484.
[8] 吴恩慈,田俊华.汉语作文自动评价及其关键技术: 来自作文自动评价(AEE)的经验[J].教育测量与评价,2019(08): 45-54.
[9] 丁革建,刘畅.作文自动评分技术综述[J].计算机应应用,2022,42(S1): 386-390.
[10] RIDLEY R, HE L, DAI X, et al. Automated cross-prompt scoring of essay traits[C]//Proceedings of the AAAI, 2021, 35(15): 13745-13753.
[11] BENNETT D, PARRY G. The accuracy of reformulation in cognitive analytic therapy: A validation study[J]. Psychotherapy Research, 1998, 8(1): 84-103.
[12] MATHIAS S, BHATTACHARYYA P. ASAP++: Enriching the ASAP automated essay grading dataset with essay attribute scores[C]//Proceedings of LREC, 2018: 1169-1176.
[13] TORREY L, SHAVLIK J. Transfer learning[M]//Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques. IGI global, 2010: 242-264.
[14] JIN C, HE B, HUI K, et al. TDNN: A two-stage deep neural network for prompt-independent automated essay scoring[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2018: 1088-1097.
[15] LI X, CHEN M, NIE J Y. SEDNN: Shared and enhanced deep neural network model for cross-prompt automated essay scoring[J]. Knowledge Based System, 2020: 210-219.
[16] CAO Y, JIN H, WAN X, et al. Domain-adaptive neural automated essay scoring[C]//Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020: 101-120.
[17] BAILEY S, MEURERS D. Diagnosing meaning errors in short answers to reading comprehension questions. [C]//Proceedings of the 3rd Workshop on Innovative Use of NLP for Building Educational Applications, 2008: 107-115.
[18] MOHLER M, MIHALCEA R. Text-to-text semantic similarity for automatic short answer grading[C]//Proceedings of the 12th Annual Meeting of the Association for Computational Linguistics, 2009: 567-575.
[19] GRANGER S, DAGNEAUX E, MEUNIE F, et al. International corpus of learner English[M].Presses Universitaires de Louvain, 2009.
[20] YANNAKOUDAKIS H, BRISCOE T, MEDLOCK B. A new dataset and method for automatically grading ESOL texts[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistic: Human Language Technologies, 2011: 180-189.
[21] DZIKOVSKA M O, NIELSEN R, BREW C. Towards effective tutorial feedback for explanation questions: a dataset and baselines[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2012: 200-210.
[22] Automated Student Assessment Prize (ASAP)[DB/OL]. https://www.kaggle.com/c/asap-sas/,2012.
[23] CORRENTI R, MATSUMURA L C, HAMILTON L, et al. Assessing students skills at writing analytically in response to texts[J]. Elem Sch J, 2013,114(2): 142-177.
[24] SAKAGUCHI K, HEILMAN M, MADNANI N. Efective feature integration for automated short answer scoring[C]//Proceedings of the Conference of the North American Chapter of the Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2015: 1049-1054.
[25] BLANCHARD D, TETREAULT J, HIGGINS D, et al. TOEFL11: A corpus of non-native English[J]. ETS Research Report Series, 2013(2): 1-15.
[26] BASU S, JACOBS C, VANDERWENDE L. Powergrading: A clustering approach to amplify human effort for short answer grading[J].TACL, 2013,1: 391-402.
[27] STAB C, GUREVYCH I. Identifying argumentative discourse structures in persuasive essays[C]//Proceedings of Empirical Methodsin Natural Language Processing, 2014: 46-56.
[28] PHANDI P, CHAI K, NG T. Flexible domain adaptation for automated essay scoring using correlated linear regression[C]//Proceedings of Empirical Methods in Natural Language Processing, 2015: 431-439.
[29] HONGBO C, BEN H. Automated essay scoring by maximizing human-machine agreement[C]//Proceedings of Empirical Methodsin Natural Language Processing, 2013: 1741-1752.
[30] RONAN C, MENG Z, TED B. Constrained multi-task learning for automated essay scoring[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistic, 2016: 789-799.
[31] ISAAC P, VINCENT NG. Modeling thesis clarity in student essays[C]//Proceedings of the 51th Annual Meeting of the Association for Computational Linguistic, 2013: 260-269.
[32] VAJIALA S. Automated assessment of non-native learner essays: Investigating the role of linguistic feature[J]. International Journal of Artificial Intelligence in Education, 2018, 28(1): 79-105.
[33] SALIM Y, STEVANUS V, BARLIAN E, et al. Automated English digital essay grader using machine learning[C]//Proceedings of the IEEE International Conference on Engineering TALE, 2019: 1-6.
[34] TAGHIPOUR K, NG H T. A neural approach to automated essay scoring[C]//Proceedings of the Empirical Methodsin Natural Language Processing, 2016: 1882-1891.
[35] Alikaniotis D, HELEN Y, MAREK R. Automatic text scoring using neural networks[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistic, 2016: 715-725.
[36] DONG F, ZHANG Y. Automatic features for essay scoring: an empirical study[C]//Proceedings of the Conference on EMNLP, 2016: 1072-1077.
[37] DONG F, ZHANG Y, YANG J. Attention-based recurrent convolutional neural network for automatic essay scoring[C]//Proceedings of the Conference on Computational Natural Language Learning, 2017: 153-162.
[38] TAY Y, PHAN M C, Tuan L A, et al. SKIPFLOW: Incorporating neural coherence features for end-to-end automatic text scoring[C]//Proceedings of the Association for the Advancement of Artificial Intelligence, 2018: 5948-5955.
[39] LIANG G, ON B W, JEONG D, et al. Automated essay scoring: A siamese bidirectional LSTM neural network architecture[J]. Symmetry, 2018, 10(12): 682.
[40] MAYFELD E, BLACK A W.Should you fine-tune BERT for automated essay scoring?[C]//Proceedings of the Workshop on Innovative Use of NLP for Building Educational Applications, 2020: 151-162.
[41] RODRIGUEZ P U, JAFARI A, ORMEROD C M.Language models and automated essay scoring[J]. arXiv: 1909 09482, 2019.
[42] YANG R, CAO J, WEN Z, et al. Enhancing automated essay scoring performance via finetuning pre-trained language models with combination of regression and ranking[C]//Proceedings of Empirical Methodsin Natural Language Processing, 2020: 1560-1569.
[43] GAO J, YANG Q, ZHANG Y, et al. A bi-modal automated essay scoring system for handwritten essays[C]//Proceedings of the Conference of International Joint Conference on Neural Networks, 2021: 1-8.
[44] 梁茂成. 中国学生英语作文自动评分模型的构建[D]. 南京: 南京大学博士学位论文,2005.
[45] 唐芳, 庄翠娟, 巩艺超. 作文自动评分系统在大学英语写作教学中的应用: 以句酷批改网为例[J]. 海外英语, 2017, 20: 48-49.
[46] 魏思, 巩捷甫, 王士进, 等.利用深层语言分析改进中文作文自动评分方法[J].中文信息学报,2022,36(04): 111-123.
[47] IN 课堂智能教育平台. IN课堂: 语文作文智能批改教育迈向智能化阶段[N]. http://inketang.com/v8/news_detail_00.html, 2018.
[48] 陈一乐.基于回归分析的中文作文自动评分技术研究[D]. 哈尔滨: 哈尔滨工业大学硕士学位论文,2016.
[49] 曹亦薇, 杨晨.使用潜语义分析的汉语作文自动评分研究[J]. 考试研究, 2007, 1(2): 65-73.
[50] 巩捷甫.面向语文作文自动评阅的修辞手法识别系统的设计与实现[D]. 哈尔滨: 哈尔滨工业大学硕士学位论文, 2016.
[51] WEI S, DONG W, RUIJI F, et al. Discourse mode identification in essays[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistic,2017: 112-122.
[52] 马晓丽, 刘杰, 周建设, 等. 一种中小学汉语作文表现手法分类方法[J]. 计算机应用与软件, 2018, 35(10): 49-54.
[53] 付瑞吉,王栋,王士进,等.面向作文自动评分的优美句识别[J]. 中文信息学报, 2018, 32(06): 88-97.
[54] 刘明杨,秦兵,刘挺.基于文采特征的高考作文自动评分[J].智能计算机与应用, 2016, 6(01): 1-4.
[55] 刘明杨.高考作文自动评分关键技术研究[D].哈尔滨: 哈尔滨工业大学硕士学位论文,2015.
[56] 吴恩慈,田俊华.基于语言学特征的小学生作文流畅性自动评价[J].教育测量与评价,2020(03): 41-50.
[57] 刘杰,孙娜,袁克柔,等.中文作文句间逻辑合理性智能判别方法研究[J].计算机应用与软件,2019,36(01): 71-77.
[58] HAO S,XU Y,PENG H,et al. Automated Chinese essay scoring from topic perspective using regularized latent semantic indexing[C]//Proceedings of the International Conference on Pattern Recognition,2014: 3092-3097.
[59] XU Y, KE D, SU K. Contextualized latent semantic indexing: A new approach to automated Chinese essay scoring[J]. Journal of Intelligent Systems, 2017, 26(2): 263-285.
[60] 王耀华,李舟军,何跃鹰,等.基于文本语义离散度的自动作文评分关键技术研究[J]. 中文信息学报, 2016,30(6): 173-181.
[61] 钟启东,张景祥.嵌入语言深度感知的汉语作文评分算法[J].计算机工程与应用,2020,56(08): 124-129.
[62] 蔡黎,彭星源,赵军.少数民族汉语考试的作文辅助评分系统研究[J].中文信息学报,2011,25(05): 120-126.
[63] GEORGE L, MARK J. Metaphors we live by[M]. University of Chicago Press, 2008.
[64] LIU L Z, HU X, SONG W, et al. Neural multitask learning for simile recognition[C]//Proceedings of Empirical Methodsin Natural Language Processing, 2018: 1543-1553.
[65] 梅家驹.同义词词林[M]. 上海: 商务印书馆, 1984.
[66] SONG W, LIU T, FU R J, et al. Learning to identify sentence parallelism in student essays[C]//Proceedings of the Conference of International Conference on Computational Linguistics, 2016: 794-803.
[67] 穆婉青, 廖健, 王素格. 融合CNN和结构相似度计算的排比句识别及应用[J].中文信息学报,2018,32(02): 139-146.
[68] 朱晓亮,谯宇同.基于BERT模型的排比句自动识别方法[J].计算机应用与软件,2021,38(07): 153-158.
[69] 冯胜利,王洁,黄梅.汉语书面语体庄雅度的自动测量[J].语言科学,2008,7(2): 113-126.
[70] SONG W, SONG Z Y, LIU L Z, et al. Hierarchical multi-task learning for organization evaluation of argumentative student essays[C]//Proceedings of the conference of International Joint Conference on Artificial Intelligence, 2020: 3875-3881.
[71] 刘杰,张文轩,李亚光,等.基于孪生神经网络的行文一致性测评研究[J].北京理工大学报,2022,42(06): 649-657.
[72] 黄志娥,谢佳莉,荀恩东. HSK自动作文评分的特征选取研究[J].计算机工程与应用, 2014, 50(6): 118-122.
[73] 徐昌火, 陈东, 吴倩, 等. 汉语作为第二语言作文自动评分研究初探[J]. 国际汉语教学研究,2015,1(1): 83-89.
[74] WANG Y, HU R. A Prompt-independent and interpretable automated essay scoring method for Chinese second language writing[C]//Proceedings of China National Conference on CCL, 2021: 450-470.
[75] 李琳,董璐璐,马洪超. 基于BERT的汉语作文自动评分研究[J].中国考试,2022(05): 73-80.
[76] SONG W, ZHANG K, FU R, et al. Multi-seaye pretraining for antomated Chinese essay scoring[C]//Proceedings of the Conference on Empirial Methods in Natural Language Processing, 2020: 6723-6733.
[77] UTO M, XIE Y, UENO M. Neural automated essay scoring incorporating handcrafted features[C]//Proceedings of the International Conference on Computational Linguistics, 2020: 6077-6088.
[78] MIAO N, ZHOU H, MOU L, et al. CGMH: constrained sentence generation by metropolis-hastings sampling[C]//Proceedings of Association for the Advancement of Artificial Intelligence, 2019, 33: 6834-6842.
[79] ATTALI Y. Reliability-based feature weighting for automated essay scoring[J]. Applied Psychological Measurement, 2015, 39(4): 303-313.
[80] UTO M, UCHIDA Y. Automated short-answer grading using deep neural networks and item response theory[C]//Proceedings of the Artifcial Intelligence in Education, 2020: 334-339.
[81] 周建设.人文基因智能计算将成为语言文字资源建设的新途径[J].语言战略研究,2022,7(05): 10.

基金

科技创新2030重大项目(2020AAA0109700);国家语委项目(YB145-16,YB135-163);中国博士后科学基金(2022M722231);国家社会科学基金(22CYY036)
PDF(2698 KB)

2091

Accesses

0

Citation

Detail

段落导航
相关文章

/