1.School of Computer Science, Qinghai Normal University, Xining, Qinhai 810008, China; 2.Tibetan Information Processing and Machine Translation Key Laboratory of Qinghai Province, Xining, Qinhai 810008, China
Abstract:This paper puts the Tibetan character error detection task as a classification problem. First of all, the syllable confusion subset is built according to the language knowledge and each Tibetan sentence is add with noise. Then a deep bi-direction representation based BERT is applied in the classification model. Two baseline model and test sets of different domains are then constructed. The experimental results show that this method is superior to the two baseline models. The accuracy of sentence classification in the same method can reach 93.74%, and achieve 83.6% in test from different fields. In the syllable level, the performance of true negative s is 74.53%, and false negative is 2.30%.
[1] 张仰森,俞士汶. 文本自动校对技术研究综述[J]. 计算机应用研究,2006(06): 8-12. [2] 刘磊,梁茂成. 英语学习者书面语法错误自动检测研究综述[J]. 中文信息学报,2018,32(01): 1-8. [3] 张梅,印勇. 英语作文计算机评分技术综述[J]. 外语电化教学,2010(06): 44-47,52. [4] 杨晓琼,戴运财. 基于批改网的大学英语自主写作教学模式实践研究[J]. 外语电化教学,2015(02): 17-23. [5] Liu Z R, Liu Y . Exploiting Unlabeled Data for Neural Grammatical Error Detection[J]. 计算机科学技术学报: 英文版, 2017, 032(004): 758-767. [6] Jacob Devlin, Ming-Wei Chang, Kenton Lee, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv: 1810. 04805, 2018. [7] 骆卫华,罗振声,宫小瑾. 中文文本自动校对技术的研究[J]. 计算机研究与发展, 2004,41(1): 244-249. [8] Hao Li, Aodengbala, Gong Zheng, et al. A research on automatic proofreading for mongolian text based on bayes algorithm[J]. Journal of Inner Mongolia University, 2010,41 (4): 440-442. [9] Ren H, Yang L, Xun E. A sequence to sequence learning for Chinese grammatical error correction[J]. NLPCC 2018: Natural Language Processing and Chinese Computing,Springer: 401-410. [10] Zhou J, Li C, Liu H, et al. Chinese grammatical error correction using statistical and neural models[G].LNCS 11109: NLPCC 2018: Natural Language Processing and Chinese Computing,Berlin: Springerm, 2018: 117-128. [11] Fu K, Huang J, Duan Y . Youdao's Winning Solution to the NLPCC-2018 Task 2 Challenge: A Neural Machine Translation Approach to Chinese Grammatical Error Correction[G]//LNCS 1109: NLPCC 2018: Natural Language Processing and Chinese Computing,Berlin: Springer, 2018: 341-350. [12] Dale R, Anisimoff I, Narroway G. Hoo 2012: A report on the preposition and determiner error correction shared task[C]//Proceedings of the Workshop on Building Educational Applications Using Nlp. Association for Computati Linguistics, 2012: 54-62. [13] Ng H T,Wu S M,Briscoe T,et al. The CoNLL-2014 shared task on grammatical error correction[C]//Proceedings of the 18th Conference on Computational Natural Language Learning: Shared Task, 2014: 1-14. [14] 才智杰,孙茂松,才让卓玛. 一种基于向量模型的藏文字拼写检查方法[J]. 中文信息学报,2018,32(09): 47-55. [15] 珠杰,李天瑞,刘胜久. TSRM 藏文拼写检查算法[J]. 中文信息学报,2014,28(3): 92-98. [16] 色差甲,贡保才让,才让加. 藏文音节拼写检查的CNN模型[J]. 中文信息学报,2019,33(01): 111-117. [17] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.