高考语文阅读理解问答相对普通阅读理解问答难度更大,同时高考问答任务中的训练数据较少,目前的深度学习方法不能取得良好的答题效果。针对这些问题,该文提出融合BERT语义表示的高考阅读理解答案候选句抽取方法。首先,采用改进的MMR算法对段落进行筛选;其次,运用微调之后的BERT模型对句子进行语义表示;再次,通过SoftMax分类器对答案候选句进行抽取,最后利用PageRank排序算法对输出结果进行二次排序。该方法在北京近十年高考语文阅读理解问答题上的召回率和准确率分别达到了61.2%和50.1%,验证了该方法的有效性。
Abstract
Reading comprehension Q&A of Chinese college entrance examination is much more difficult than general reading comprehension Q&A, and the training data in the task is relatively small, so the method based on deep learning can not achieve satisfactory results. To solve these problems, this paper proposes an answer candidate sentence extraction method in reading comprehension of college entrance examination based on BERT semantic representation. First, the improved MMR algorithm is used to filter the paragraphs, then the BERT model is applied to represent the sentences semantically, then the softMax classifier is used to extract the answer candidate sentences, and finally we sort the output of the BERT model by PageRank algorithm. The recall and accuracy of our method on Chinese reading comprehension question of Beijing college entrance examination in recent ten years are 61.2% and 50.1% respectively, which proves the effectiveness of our method.
关键词
高考阅读理解 /
自动问答 /
段落评价 /
BERT /
PageRank
{{custom_keyword}} /
Key words
reading comprehension of college entrance examination /
automatic Q&A /
paragraph evalution /
BERT /
PageRank
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 刘群,李素建.基于《知网》的词汇语义相似度[J].中文计算语言学,2002,7(2):59-76.
[2] Baker C F, Fillmore C J, Lowe J B. The Berkeley framenet project[C]//Proceeding of the 36th Annual Meeting of the Association for ComputationalLinguistics and 17th International Conference on Computer Linguistics Volume 1. Stroudsburg:Associationfor Computational Linguistics, 1998:86-90.
[3] 李茹,王智强,李双红,等.基于框架语义分析的汉语句子相似度计算[J].计算机研究与发展,2013,50(8): 1728-1736.
[4] 田久乐,赵蔚.基于同义词词林的词语相似度计算方法[J].吉林大学学报(信息科学版),2010,28(6): 602-608.
[5] 张志昌,张宇,刘挺,等.基于浅层语义树核的阅读理解答案句抽取[J].中文信息学报,2008,22(1): 80-86.
[6] 王智强,李茹,梁吉业,等.基于汉语篇章框架语义分析的阅读理解问答研究[J].计算机学报,2016,39(04): 795-807.
[7] Wang H,Bansal M,Gimpel K,et al.Machine comprehension with syntax,frames,and semantics[C]//Pro-eedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing,2015.
[8] 李国臣,刘姝林,杨陟卓,等. 基于框架语义的高考语文阅读理解答案句抽取[J].中文信息学报,2016,30(06): 164-172.
[9] Vinyals O,Fortunato M,Jaitly N.Pointer networks[C]//Proceedings of the Advances in Neural Information Processing Systems.Curran Associates.Cambridge:MIT Pres,2015: 2692-2700.
[10] Seo,Minjoon,et al.Bidirectional attention flow for machine comprehension [DB/OL].https://arxiv.org/abs/1611.01603[2018-08-28].
[11] Xiong C,Zhong V,Socher R.Dynamic coattention network for question answering[P]US.Patent Application 15/421.[2018-05-10].
[12] Wang W,Yang N,Wei F,et al.Gated self-matchi ng networks for reading comprehension and question answering[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Lingui-stics. Stroudsburg:ACL,2017: 189-198.
[13] 谭红叶,刘蓓,王元龙.基于QU-NNs的阅读理解描述类问题的解答[J].中文信息学报,2019,33(03):102-109.
[14] Peters M,Neumann M,Iyyer M,et al. Deep contextualized word representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics.Str-oudsburg:Association for Computational Linguistics,2018: 2227-2237.
[15] Radford A,Narasimhan K,Salimans T,et al.Improvi-ng language understanding by generative pre-training[J/OL].https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/ language understanding paper,2018.
[16] Devlin J,Chang M W,Lee K,et al.BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, 2019(1): 4171-4186.
[17] Brin S,Page L.The anatomy of a large-scale hypertextual Web search engine[J].Computer Networks and ISDN Systems,1998,30 (1-7): 107-117.
[18] Che W,Li Z,Liu.L. A Chinese language technologyplatform[C]//Proceedings of the 23rd International Conference on Computational Linguistics:Demonstrations. Association for Computational Linguistics,2010: 13-16.
[19] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[EB/OL]. http://arxiv.org/pdf/1301.3781[2014-02-10].
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家重点研发计划项目(2018YFB1005103);国家自然科学基金(61772324);山西省基础研究计划面上项目(20210302123469);山西省1331工程项目
{{custom_fund}}