基于文本语义离散度的自动作文评分关键技术研究

王耀华;李舟军;何跃鹰;巢文涵;周建设

PDF(3584 KB)
PDF(3584 KB)
中文信息学报 ›› 2016, Vol. 30 ›› Issue (6) : 173-181.
综述

基于文本语义离散度的自动作文评分关键技术研究

  • 王耀华1;李舟军1;何跃鹰2;巢文涵1;周建设3
作者信息 +

Research on Key Technology of Automatic Essay Scoring #br# Based on Text Semantic Dispersion

  • WANG Yaohua1; LI Zhoujun1; HE Yueying2; CHAO Wenhan1; ZHOU Jianshe3
Author information +
History +

摘要

该文尝试从文本语义离散度的角度去提升自动作文评分的效果,提出了两种文本语义离散度的表示方法,并给出了数学化的计算公式。基于现有的LDA模型、段落向量、词向量等具体方法,提取出四种表征文本语义离散度的实例,应用于自动作文评分。该文从统计学角度将文本语义离散度向量化,从去中心化的角度将文本语义离散度矩阵化,并使用多元线性回归、卷积神经网络和循环神经网络三种方法进行对比实验。实验结果表明,在50篇作文的验证集上,在加入文本语义离散度特征后,预测分数与真实分数之间均方根误差最大降低10.99%,皮尔逊相关系数最高提升2.7倍。该表示方法通用性强,没有语种限制,可以扩展到任何语言。

Abstract

Based on the existing methods, including LDA model, paragraph vector, word vector text, we extract four kinds of text semantic dispersion representations, and apply them on the automatic essay scoring. This paper gives a vector form of the text semantic dispersion from the statistical point of view and gives a matrix form from the perspective of decentralized text semantic dispersion, experimented on the multiple linear regression, convolution neural network and recurrent neural network. The results showed that, on the test data of 50 essays, after the addition of text semantic dispersion feature, the Root Mean Square Error is reduced by 10.99% and the Pearson correlation coefficient increases 2.7 times.

关键词

作文评分 / 语义离散度 / 神经网络

Key words

Automatic Essay Scoring / semantic dispersion / neural network
 
/   /   /
 
/   /   /
 
/   /  

引用本文

导出引用
王耀华;李舟军;何跃鹰;巢文涵;周建设. 基于文本语义离散度的自动作文评分关键技术研究. 中文信息学报. 2016, 30(6): 173-181
WANG Yaohua; LI Zhoujun; HE Yueying; CHAO Wenhan; ZHOU Jianshe. Research on Key Technology of Automatic Essay Scoring #br# Based on Text Semantic Dispersion. Journal of Chinese Information Processing. 2016, 30(6): 173-181

参考文献

[1] Page E B. Project essay grade: PEG[J].Automated essay scoring: A cross-disciplinary perspective, 2003: 43-54.
[2] Hearst M A. The debate on automated essay grading[J].Intelligent Systems and their Applications, IEEE, 2000, 15(5): 22-37.
[3] Valenti S, Neri F, Cucchiarelli A. An overview of current research on automated essay grading[J].Journal of Information Technology Education, 2003, 2: 319-330.
[4] Rudner L M, Liang T. Automated essay scoring using Bayes’ theorem[J].The Journal of Technology, Learning and Assessment, 2002, 1(2):3-18.
[5] 梁茂成,文秋芳. 国外作文自动评分系统评述及启示[J].外语电化教学,2007,05: 18-24.
[6] Steyvers M, Griffiths T. Probabilistic topic models[J].Handbook of latent semantic analysis, 2007, 427(7): 424-440.
[7] Harris, David and Harris, Sarah. Digital design and computer architecture (2nd ed.)[M]. San Francisco, Calif.: Morgan Kaufmann. Elsevier,2012:129.
[8] Hinton G E. Learning distributed representations of concepts[C]//Proceedings of the eighth annual conference of the cognitive science society. 1986, 1: 12.
[9] Bengio Y, Schwenk H, Senécal J S, et al. Neural probabilistic language models[M]. Innovations in Machine Learning. Springer Berlin Heidelberg, 2006: 137-186.
[10] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the Advances in neural information processing systems. 2013: 3111-3119.
[11] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[C]//Proceedings of ICLR Workshop.2013.
[12] Le Q V, Mikolov T. Distributed representations of sentences and documents[C]//Proceedings of ICML,2014.
[13] Kim Y. Convolutional neural networks for sentence classification[C]//Proceedings of EMNLP,2014.
[14] Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult[J].Neural Networks, IEEE Transactions, 1994, 5(2): 157-166.
[15] Hochreiter S, Schmidhuber J. Long short-term memory[J].Neural computation, 1997, 9(8): 1735-1780.
[16] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A simple way to prevent neural networks from overfitting[J].The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[17] He K, Zhang X, Ren S, et al. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 1026-1034.

基金

国家自然科学基金(61170189,61370126, 61202239,U1636211);国家863计划(2015AA016004,2014AA015105);北京成像技术高精尖创新中心项目(BAICIT-2016001)
PDF(3584 KB)

728

Accesses

0

Citation

Detail

段落导航
相关文章

/