Abstract:This paper discusses the automatic generation strategy of four types of vocabulary test questions: word listening, multi-word selection, word order and single word selection.. A knowledge base is built to extract word-level features including pronunciation, senses, grammars, collocations, learners errors, etc. Sentence analysis modules are also developed for automatic identification of grammatical constructions and the estimation of sentence difficulty degrees. By selecting proper sentences, target words and distractors, 7263 vocabulary test questions are automatically generated in the experiment. The manual evaluation shows that the automatic generation strategy performs well with 58% of the questions evaluated as completely reasonable. After slight manual modification, the question acceptance rate is increased to 75.7%.
[1] Nation I S P. Learning vocabulary in another language[M]. Stuttgart: Ernst Klett Sprachen, 2001: 33. [2] Mitkov R, Ha L A. Computer-aided generation of multiple-choice tests[C]//Proceedings of the HLT-NAACL workshop on building educational applications using natural language processing-Volume 2. Association for Computational Linguistics, Edmonton, Canada, 2003: 17-22. [3] Brown J C,Frishkoff G A, Eskenazi M. Automatic question generation for vocabulary assessment[C]//Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Vancouver, Canada, 2005: 819-826. [4] Correia R, Baptista J, Mamede N, et al. Automatic generation of cloze question distractors[C]//Proceedings of the Interspeech Satellite Workshop on Second Language Studies: Acquisition, Learning, Education and Technology, Waseda University, Tokyo, Japan. 2010. [5] Goto T, Kojiri T, Watanabe T, et al. Automatic generation system of multiple-choice cloze questions and its evaluation[J]. Knowledge Management & E-Learning: An International Journal, 2010, 2(3): 210-224. [6] 杨丽姣, 肖航. 汉语深层语义理解与知识表示——面向语义搜索的语料库语境信息标注研究[J]. 语言文字应用, 2015,(1): 107-116. [7] 胡韧奋, 曹冰, 杜健一. 现代汉字形声字声符在普通话中的表音度测查[J]. 中文信息学报, 2013, 27(3): 41-48. [8] Lin D. Extracting collocations from text corpora[C]//Proceedings of the First workshop on computational terminology.University of Montreal, Montreal, Canada, 1998: 57-63. [9] Che W, Li Z, Liu T. LTP: A Chinese language technology platform[C]//Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations. Association for Computational Linguistics, Beijing, China, 2010: 13-16. [10] Hindle D. Noun classification from predicate-argument structures[C]//Proceedings of the 28th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, University of Pittsburgh, Pittsburgh, USA, 1990: 268-275. [11] Lin D. Automatic retrieval and clustering of similar words[C]//Proceedings of the 17th International Conference on Computational linguistics-Volume 2. Association for Computational Linguistics,University of Montreal, Montreal, Canada, 1998: 768-774. [12] 国家汉办/孔子学院总部. 国际汉语教学通用课程大纲[Z]. 北京: 外语教学与研究出版社, 2009: 80-96. [13] 李桂梅,张晋军,解妮妮,符华均. 新HSK词汇控制对试卷难度影响的研究[J]. 中国考试,2015,03: 38-40. [14] Liu, F, Yang M, Lin D. Chinese Web 5-gram Version 1LDC2010T06[Z]. Philadelphia: Linguistic Data Consortium, 2010.