针对英语文章语法错误自动纠正(Grammatical Error Correction,GEC)问题中的冠词和介词错误,该文提出一种基于LSTM(Long Short-Term Memory,长短时记忆)的序列标注GEC方法;针对名词单复数错误、动词形式错误和主谓不一致错误,因其混淆集为开放集合,该文提出一种基于ESL(English as Second Lauguage)和新闻语料的N-gram投票策略的GEC方法。该文方法在2013年CoNLL的GEC数据上实验的整体F1值为33.87%,超过第一名UIUC的F1值31.20%。其中,冠词错误纠正的F1值为38.05%,超过UIUC冠词错误纠正的F1值33.40%,介词错误的纠正F1为28.89%,超过UIUC的介词错误纠正F1值7.22%。
Abstract
To deal with the incorrect usage of articles and prepositions in GEC (Grammatical Error Correction) area, this paper proposes a sequence labeling method. As for incorrect usage of noun form, verb form and subject-verb agreement, this paper proposes an N-gram voting strategy based on corpus collected from ESL (English as Second Language) essays and news. The results show that the method in this paper on CoNLL (2013) corpus achieves an overall F1 score of 33.87%, outperforming the top ranked UIUC‘s F1 score (31.20%), and a 38.05% F1 score for article errors and 28.89% for preposition errors, both exceeding UIUC's result (33.40% for article errors and 7-22% for preposition errors, respectively).
关键词
语法错误自动纠正 /
LSTM /
N-gram投票策略 /
ESL语料
{{custom_keyword}} /
Key words
grammatical error correction /
LSTM /
N-gram voting strategy /
ESL corpus
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Kukich K.Techniques for automatically correcting words in text[J].ACM Computing survey (CSUR),1912,24(4),377-439.
[2] Ng H T,Wu S M,Wu Y,et al.The CoNLL-2013 Shared Task on Grammatical Error Correction[C]//Proceedings of the Seventeenth Conference on Computational Natural Language Learning:Shared Task,2013:1-12.
[3] Hochreiter S,Schmidhuber J.Long short-term memory[J].Neural computation 9,8(1997),1735-1780.
[4] Kilgarriff A.Helping our own:the HOO 2011 pilot shared task[C]//Proceedings of the European Workshop on Natural Language Generation.Association for Computational Linguistics,2011:242-249.
[5] Dale R,Anisimoff I,Narroway G.HOO 2012:a report on the preposition and determiner error correction shared task[C]//Proceedings of the Workshop on Building Educational Applications Using Nlp.Association for Computational Linguistics,2012:54-62.
[6] Ng H T,Wu S M,Briscoe T,et al.The CoNLL-2014 Shared Task on Grammatical Error Correction[C]//Proceedings of the Eighteenth Conference on Computational Natural Language Learning:Shared Task,2014:1-14.
[7] 吴伟成,周俊生,曲维光.基于统计学习模型的句法分析方法综述[J].中文信息学报,2013,27(3):9-19.
[8] 董喜双,关毅.基于有监督学习的依存句法分析模型综述[J].智能计算机与应用,2013,3(2):11-15.
[9] Rozovskaya A,Chang K W,Sammons M,et al.The University of Illinois System in the CoNLL-2013 Shared Task[C]//Proceedings of the Seventeenth Conference on Computational Natural Language Learning:Shared Task,2013.
[10] Kao T H,Chang Y W,Chiu H W,et al.CoNLL-2013 Shared Task:Grammatical Error Correction NTHU System Description[C]//Proceedings of the Seventeenth Conference on Computational Natural Language Learning:Shared Task,2013:20-25.
[11] Felice M,Yuan Z,istein E.Andersen,et al.Grammatical error correction using hybrid systems and type filtering[C]//Proceedings of the Eighteenth Conference on Computational Natural Language Learning:Shared Task,2014:15-24.
[12] Rozovskaya A,Sammons M,Dan R.The UI system in the HOO 2012 shared task on error correction[C]//Proceedings of the Workshop on Building Educational Applications Using Nlp.Association for Computational Linguistics,2013:272-280.
[13] 谭咏梅,吴坤.面向英语文章的词性标注算法[J].北京邮电大学学报,2014,37(6):120-124.
[14] 郭永辉,吴保民,王炳锡.一种用于词性标注的相关投票融合策略[J].中文信息学报,2007,21(2):9-13.
[15] HochreiterY,Bengio P Frasconi,J Schmidhuber.Gradient Flow in Recurrent Nets:the Difficulty of Learning Long-term Dependencies[M].S.C.Kremer and J.F.Kolen,editors,A Field Guide to Dynamical Recurrent Neural Networks.WiLey-IEEE Press,2001.
[16] 谭咏梅,王晓辉,杨一枭.基于语料库的英语文章语法错误检查及纠正方法[J].北京邮电大学学报,2016,39(4):92-97.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}