阅读理解是目前NLP领域的一个研究热点。阅读理解中好的复杂问题解答策略不仅要进行答案句的抽取,还要对答案句进行融合、生成相应的答案,但是目前的研究大多集中在前者。该文针对复杂问题解答中的句子融合进行研究,提出了一种兼顾句子重要信息、问题关联度与句子流畅度的句子融合方法。该方法的主要思想为: 首先,基于句子拆分和词重要度选择待融合部分;然后,基于词对齐进行句子相同信息的合并;最后,利用基于依存关系、二元语言模型及词重要度的整数线性规划优化生成句子。在历年高考阅读理解数据集上的测试结果表明,该方法取得了82.62%的F值,同时更好地保证了结果的可读性及信息量。
Abstract
Reading comprehension system is a research focus in natural language processing. In these systems,both answer extraction and sentence fusion are necessary for answering complex problems. This paper focuses on the techniques of sentence fusion for complex problems, and presents a method considering the sentence importance, the relevancy to queries and the sentence readability. This method first chooses the partsto be fused based on sentence division and word salience. Then, the repeated contents are merged by word alignments. Finally, the sentences are generated based on the integer linear optimization, which utilizes dependency relations, the language model and word salient. The experiments on reading comprehension datasets in college entrance examinations achieve an F-measure of 82.62%.
关键词
阅读理解 /
复杂问题 /
句子融合 /
文本生成
{{custom_keyword}} /
Key words
reading comprehension /
complex problems /
sentence fusion /
text generation
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Voorhees E M,Tice D M. Building a question answering test collection[C]//Proceeding of International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2000: 200-207.
[2] 张志昌, 张宇, 刘挺, 等. 开放域问答技术研究进展[J]. 电子学报, 2009, 37(5): 1058-1069.
[3] Matthew Richardson, Christopher J.C. Burges, Eric Renshaw. MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013: 193-203.
[4] Jason Weston, Antoine Borses, Sumit Chopra,et al. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks[J]. Computer Science, 2015.
[5] Lynette Hirschman, Marc Light, Eric Breck, et al.Deep Read: A reading comprehension system[C]// Meeting of the Association for Computational Linguistics, 2002: 325-332.
[6] 张志昌, 张宇, 刘挺, 等. 基于话题和修辞识别的阅读理解why型问题回答[J]. 计算机研究与发展, 2011, 48(2): 216-223.
[7] Jawad Sadek, Fairouz Chakkour, Farid Meziane.Arabic Rhetorical Relations Extraction for Answering "Why" and "How to" Questions[C]//Proceedings of International Conference on Applications of Natural Language Processing and Information Systems, 2012: 385-390.
[8] Kevin Knight, Daniel Marcu. Statistics-Based Summarization-Step One: Sentence Compression[C]//Proceedings of Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence. AAAI Press, 2000: 703-710.
[9] Nitin Madnani, Jimmy Lin, Bonnie Dorr. TREC 2007 ciQA Task: University of Maryland[C]//Proceeding of Sixteenth Text Retrieval Conference, Trec 2007, 2007: 214-220.
[10] K Knight, D Marcu. Summarization beyond sentence extraction: A probabilistic approach to sentence compression[J]. Artificial Intelligence, 2002, 139(1): 91-107.
[11] J Turner, E Charniak. Supervised and unsupervised learning for sentence compression[C]//Proceeding of Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2005: 140-141.
[12] RT McDonald. Discriminative Sentence Compression with Soft Syntactic Evidence.[C]//Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, 2006.
[13] Wanxiang Che, Yanyan Zhao, Honglei Guo, et al. Sentence compression for aspect-based sentiment analysis[J]. Audio Speech & Language Processing IEEE/ACM Transactions on, 2015, 23(12): 2111-2124.
[14] Katja Filippova, Enrique Alfonseca, Carlos A. Colmenares, et al. Sentence Compression by Deletion with LSTMs[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015: 360-368.
[15] Barzilay,Regina, Kathleen R. McKeown. Sentence Fusion for Multidocument News Summarization[J]. Computational Linguistics, 2005, 31(3): 297-328.
[16] Marsi, Erwin, Emiel Krahmer.Explorations in sentence fusion[C]//Proceedings of the 10th European Workshop on Natural Language Generation, 2010: 109-117.
[17] Katja Filippova, Michael Strube. Sentence fusion via dependency graph compression[C]//Proceeding of Conference on Empirical Methods in Natural Language Processing, 2008: 177-185.
[18] Stephen Wan, Robert Dale, Mark Dras, et al. Global revision in summarization: Generating novel sentences with Prims algorithm[C]//Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics, 2007: 26-235.
[19] Lidong Bing, Piji Li, Yi Liao, et al. Abstractive Multi-Document Summarization via Phrase Selection and Merging[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 2015: 1587-1597.
[20] Kavita Ganesan, ChengXiang Zhai, and Jiawei Han.Opinosis: A Graph-based Approach to Abstractive Summarization of Highly Redundant Opinions[C]//Proceedings of the International Conference on Computational Linguistics, Proceedings of the Conference, 2010: 340-348.
[21] 王红玲, 张明慧, 周国栋. 主题信息的中文多文档自动文摘系统[J]. 计算机工程与应用, 2012, 48(25): 132-136.
[22] 刘江鸣, 徐金安, 张玉洁. 基于隐主题马尔科夫模型的多特征自动文摘[J]. 北京大学学报: 自然科学版, 2014, 50(1): 187-193.
[23] Marie-Catherine de Marneffe, Bill MacCartney, Christopher D.Manning. Generating Typed Dependency Parses from Phrase Structure Parses[J]. Lrec, 2006: 449-454.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家高技术研究发展计划(863计划)项目(2015AA015407);国家自然科学青年基金(61100138,61403238);山西省自然科学基金(2011011016-2,2012021012-1);山西省回国留学人员科研项目(2013-022);山西省高校科技开发项目(20121117);山西省2012年度留学回国人员科技活动择优项目
{{custom_fund}}