谭红叶,郭少茹,陈鑫,王素格,李茹,张虎,杨陟卓, 陈千,钱揖丽,王元龙,关勇,吕国英. 高考语文阅读理解自动答题系统[J]. 中文信息学报, 2022, 36(4): 166-174.
TAN Hongye, GUO Shaoru, CHENG Xin, WANG Suge, LI Ru, ZHANG Hu, YANG Zhizhuo, CHEN Qian, QIAN Yili, WANG Yuanlong, GUAN Yong, LV Guoying. The Question Answering System for Gaokao Chinese Reading Comprehension. , 2022, 36(4): 166-174.
The Question Answering System for Gaokao Chinese Reading Comprehension
TAN Hongye1, GUO Shaoru1, CHENG Xin1, WANG Suge1, LI Ru1,2, ZHANG Hu1, YANG Zhizhuo1, CHEN Qian1, QIAN Yili1, WANG Yuanlong1, GUAN Yong1, LV Guoying1
1.School of Computer and Information Technology, Shanxi University, Taiyuan, Shanxi 030006, China; 2.Key Laboratory of Ministry of Education for Computational Intelligence and Chinese Information Processing, Taiyuan, Shanxi 030006, China
Abstract:Machine Reading Comprehension (MRC) is a critical task in many real-world applications, which requires machines to understand a text passage and answer relevant questions. This paper studied the key technologies of textual semantic representation, candidate sentence extraction and language appreciation, and built the system for answering multiple choice questions and free-description questions. We have conducted some experiments on the Gaokao tests, finding that the system can achieve a certain degree of accuracy for both questions. In the future, we will explore to utilize more advanced techniques such as semantic representation, unified knowledge representation and aggregation, and transfer learning to improve the MRC system in complex reasoning, inductive analyzing and language appreciating.
[1] Clark P, Etzioni O, Khot T, et al. Combining retrieval, statistics, and inference to answer elementary science questions[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2016, 30(1): 2580-2586. [2] Bringsjord S, Schimanski B. What is artificial intelligence? Psychometric AI as an answer[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2003: 887-893. [3] Clark P, Etzioni O, Khot T, et al. From ‘F’ to ‘A’ on the NY regents science exams: An overview of the ARISTO project[J]. AI Magazine, 2020, 41(4): 39-53. [4] Fujita A, Kameda A, Kawazoe A, et al. Overview of Todai robot project and evaluation framework of its NLP-based problem solving[J]. World History, 2014, 36(36): 148. [5] 搜狐. 国家863类人智能项目正式启动[EB/OL].https://www.shou.com/a/23934844-132340.[2022-03-26]. [6] Richardson M, Burges C J C, Renshaw E. MCTest: A challenge dataset for the open-domain machine comprehension of text[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2013: 193-203. [7] Hermann K M, Kocisky T, Grefenstette E, et al. Teaching machines to read and comprehend[J]. Advances in Neural Information Processing Systems, 2015: 28. [8] 刘凯,刘璐,刘璟,等. 2018机器阅读理解技术竞赛总体报告[J]. 中文信息学报, 2018, 32(10): 12. [9] Rajpurkar P, Zhang J, Lopyrev K, et al. SQuAD: 100, 000+ questions for machine comprehension of text[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2016, 2383-2392. [10] Lai G, Xie Q, Liu H, et al. RACE: Large-scale reading comprehension dataset from examinations [C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017: 785-794. [11] Jia R, Liang P. Adversarial examples for evaluating reading comprehension systems[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017: 2021-2031. [12] Joshi M, Choi E, D Weld, et al. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017, 1601-1611. [13] Welbl J, Stenetorp P, Riedel S. Constructing datasets for multi-hop reading comprehension across documents[J]. Transactions of the Association for Computational Linguistics, 2018, 6: 287-302. [14] Kocisky T, Schwarz J, Blunsom P, et al. The NarrativeQA reading comprehension challenge[J]. Transactions of the Association for Computational Linguistics, 2018, 6: 317-328. [15] Rajpurkar P, Jia R, Liang P. Know What You Don't Know: Unanswerable Questions for SQuAD[C]//Proceedings of the 56th Meeting of the Association for Computational Linguistics, 2018. [16] Yang Z, Qi P, Zhang S, et al. HotpotQA: A dataset for diverse, explainable multi-hop question answering[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 2369-2380. [17] Dua D, Wang Y, Dasigi P, et al. DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs[C]//Proceedings of NAACL-HLT, 2019:2368-2378. [18] Guo S, Li R, Tan H, et al. A frame-based sentence representation for machine reading comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 891-896. [19] Guo S, Guan Y, Li R, et al. Incorporating syntax and frame semantics in neural network for machine reading comprehension[C]//Proceedings of the 28th International Conference on Computational Linguistics, 2020: 2635-2641. [20] 郭少茹,张虎,钱揖丽,等. 面向高考阅读理解的句子语义相关度[J]. 清华大学学报(自然科学版). 2017, 57(6): 575-579. [21] 吉宇,王笑月,李茹,等. 多模块联合的阅读理解候选句抽取[C]. 第十九届中国计算语言学大会 ,2020: 236-245. [22] Yang Z, Li C, Hu Z, et al. Research on Chinese question-answering for Gaokao based on graph[J]. Mathematical Problems in Engineering, 2020: 1-11. [23] Chen X, Hai Z, Li D, et al. Jointly identifying rhetoric and implicit emotions via multi-task learning[C]//Proceedings of the Association for Computational Linguistics, 2021: 1429-1434. [24] Chen X, Hai Z, Wang S, et al. Metaphor identification: A contextual inconsistency based neural sequence labeling approach[J].Neuro computing, 2021, 428: 268-279. [25] Velickovic P, Cucurull G, Casanova A, et al. Graph attention networks[C]//Proceedings of the 6th International Conference on Learning Representations, 2017. [26] Wang X, Ji Y, Li R. Key-elements graph constructed with evidence sentence extraction for Gaokao Chinese[C]//Proceeding of the 9th CCF International Conference on Natural Language Processing and Chinese Computing, 2020: 403-414. [27] Hu M, Peng Y, Huang Z, et al. Reinforced mnemonic reader for machine reading comprehension[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018: 4099-4106. [28] Tan C, Wei F, Wang W, et al. Multi-way attention networks for modeling sentence pairs[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018: 4411-4417. [29] Tan H, Qiang P, Li R. Learning to answer word-meaning-explanation questions for Chinese Gaokao reading comprehension[C]//Proceedings of the 9thCCF International Conference on Natural Language Processing and Chinese Computing, 2020: 53-64. [30] Tan H, Wang X, Ji Y, et al. GCRC: A new challenging MRC dataset from Gaokao Chinese for explainable evaluation[C]//Proceedings of the Association for Computational Linguistics, 2021: 1319-1330.