机器阅读理解旨在训练模型使其拥有理解自然语言并回答问题的能力,以便于以较低的人力解决现实世界中的问题。该文提出了一种面向特定领域(餐饮行业)的中文阅读理解数据集——Restaurant(Res)。该数据集的初始数据来自大众点评应用程序,以餐饮行业的用户评论为初始文本,标注者在此基础上提出问题并给出答案。目前Res数据集有两个版本,Res_v1中所有问题的答案都可以在用户评论中找到,Res_v2在Res_v1的基础上增加评论中没有答案的问题,进一步契合现实场景。该文在此数据集上应用主流的BiDAF、QANet和Bert模型进行实验,实验结果显示该数据集上最高的准确率只有73.78%,相比于人类接近91.03%的正确率仍有较大差距。
Abstract
This paper proposes a Chinese reading comprehension dataset-Restaurant(Res) for a specific field(catering industry). The data are collected from the Dianping application, with user reviews in the catering industry. The annotators provide questions and annotate the answers according to the date. There are currently two versions of the Res dataset: Res_v1 contains only questions with answers in user comments, and Res_v2 includes additional questions without answers in the comments. We apply the mainstream BiDAF, QANet and Bert models in the dataset, achieving as high as 73.78% accuracy. lagging far behind human performance of 91.03%.
关键词
机器阅读理解 /
自由问答 /
自然语言处理
{{custom_keyword}} /
Key words
machine reading comprehension /
free question and answer /
natural language processing
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Chen D,Bolton J,Manning C D. A thorough examination of the CNN/daily mail reading comprehension task[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,2016: 2358-2367.
[2] Lai G,Xie Q,Liu H,et al. RACE: Large-scale ReAding comprehension dataset from examinations[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2017: 785-794.
[3] Rajpurkar P,Zhang J,Lopyrev K,et al. SQuAD: 100,000+ questions for machine comprehension of text[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2016: 2383-2392.
[4] Rajpurkar P,Jia R,Liang P. Know what you don't know: unanswerable questions for SQuAD[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,2018: 784-789.
[5] He W,Liu K,Liu J,etal. DuReader: A Chinese machine reading comprehension dataset from real-world applications[C]//Proceedings of the ACL Workshop on Vachine Reading for Ouestion Answerny,2018: 37-46.
[6] Cui Y,Liu T,Che W,et al. A Span-extraction dataset for Chinese machine reading comprehension[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing,2019: 5883-5889.
[7] Wang B,Yao T,Zhang Q,et al.Reco: A large scale Chinese reading comprehension dataset on opinion[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2020,34(05): 9146-9153.
[8] Kenton J D M W C,Toutanova L K. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of NAACL-HLT,2019: 4171-4186.
[9] Reddy S,Chen D,Manning C D. CoQA: A conver sational question answering challenge[J]. Transactions of the Association for Computational Linguistics,2019,7: 249-266.
[10] Nguyen T,Rosenberg M,Song X,et al. MS MARCO: A human generated machine reading comprehension dataset[J]. arXiv preprint arXiv: 1611.09268,2016.
[11] Trischler A,Wang T,Yuan X,et al. NewsQA: A machine comprehension dataset[C]//Proceedings of 2nd Workshop on Representation Learning for NLP,2017: 191-200.
[12] Xu H,Liu B,Shu L,et al. BERT Post-training for review reading comprehension and aspect-based sentiment analysis[C]//Proceedings of NAACL-HLT,2019: 2324-2335.
[13] Xie Q,Lai G,Dai Z,et al. Large-scale cloze test dataset designed by teachers[J]. arXiv preprint arXiv: 1711.03225,2017.
[14] Seo M,Kembhavi A,Farhadi A,et al. Bidirectional attention flow for machine comprehension[J]. arXiv preprint arXiv: 1611.01603,2016.
[15] Yu W,Dohan D,Luong M T,et al. Combining local convolution with global self-attention for reading comprehension[J]. arXiv preprint arXiv: 1804.09541,2018.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61702080,61806038,61632011,61772103)
{{custom_fund}}