|
|
A Survey to Text Summarization: Popular Datasets and Methods |
HOU Shengluan1,2, ZHANG Shuhan1,2, FEI Chaoqun1,2 |
1.Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; 2.University of Chinese Academy of Sciences, Beijing 100049, China |
|
|
Abstract Text summarization has become an essential way of knowledge acquisition from mass text documents on the Internet. The existing surveys to text summarization are mostly focused on methods, without reviewing on the experimental datasets. This survey concentrates on evaluation datasets and summarizes the public and private datasets together with corresponding approaches. The public datasets are recorded for the data source, language and the way of access, and the private dataset are recorded with the scale, access and annotation methods. In addition, the formal definition of text summarization by each public dataset are provided. We analyze the experimental results of classical and latest text summarization methods on one specific dataset. We conclude with the present situation of existing datasets and methods, and some issues concerning them.
|
Received: 28 June 2018
|
|
|
|
|
[1] Erkan G,Radev D R. Lexrank:Graph-based lexical centrality as salience in text summariza-tion[J].Journal of Artificial Intelligence Research. 2004,22:457-479. [2] Gambhir M,Gupta V. Recent automatic text summarization techniques: a survey[J]. Artificial Intelligence Review,2017,47(1):1-66. [3] Nenkova A,McKeown K. Automatic summarization[J]. Foundations and Trends in Information Retrieval,2011,5(2-3):103-233. [4] Nenkova A,McKeown K. A survey of text summarization techniques[M].Mining Text Data. Boston:Springer,2012:43-76. [5] Baralis E,Cagliero L,Fiori A,et al. Mwi-sum: A multilingual summarizer based on frequent weighted itemsets[J]. ACM Transactions on Information Systems (TOIS),2015,34(1):5. [6] Cheng J,Lapata M. Neural summarization by extracting sentences and words[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Association for Computational Linguistics,2016:484-494. [7] Mihalcea R,Tarau P. Textrank: Bringing order into text[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004. [8] Page L,Brin S,Motwani R,et al. The PageRank citation ranking: Bringing order to the web[R]. Stanford InfoLab,1999. [9] Baralis E,Cagliero L,Mahoto N,et al. GRAPHSUM: Discovering correlations among multiple terms for graph-based summarization[J]. Information Sciences,2013,249:96-109. [10] Gillick D,Favre B. A scalable global model for summarization[C]//Proceedings of the Workshop on Integer Linear Programming for Natural Language Processing. Association for Computational Linguistics,2009:10-18. [11] Fattah M A. A hybrid machine learning model for multi-document summarization[J]. Applied in-telligence,2014,40(4):592-600. [12] Rush A M,Chopra S,Weston J. A neural attention model for abstractive sentence summarization[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015:379-389. [13] Chopra S,Auli M,Rush A M. Abstractive sentence summarization with attentive recurrent neural networks[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016:93-98. [14] Nallapati R,Zhou B,dos Santos C,et al. Abstractive Text Summarization using Se-quence-to-sequence RNNs and Beyond[C]//Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. 2016:280-290. [15] Zhou Q,Yang N,Wei F,et al. Selective Encoding for Abstractive Sentence Summarization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2017:1095-1104. [16] Cao Z,Wei F,Li W,et al. Faithful to the original: Fact aware neural abstractive summarization[C] //Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018. [17] Manning C,Surdeanu M,Bauer J,et al. The stanford CoreNLP natural language processing toolkit[C]//Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations. 2014:55-60. [18] Cao Z,Li W,Li S,et al. Retrieve,rerank and rewrite: Soft template based neural summarization[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018:152-161. [19] Hermann K M,Kocisky T,Grefenstette E,et al. Teaching machines to read and comprehend[C]//Proceedings of the 29th Annual Conference on Neural Information Processing Systems. 2015:1693-1701. [20] See A,Liu P J,Manning C D. Get to the point: Summarization with pointer-generator networks[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2017:1073-1083. [21] Durrett G,Berg Kirkpatrick T,Klein D. Learning-based Single-document summarization with compression and anaphoricity constraints[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016:1998-2008. [22] Ma S,Sun X,Lin J,et al. A hierarchical End-to-End model for jointly improving text summarization and sentiment classification [C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence,2018. [23] Hu B,Chen Q,Zhu F. LCSTS: A large scale Chinese short text summarization dataset[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015:1967-1972. [24] Ma S,Sun X,Xu J,et al. Improving semantic relevance for Sequence-to-Sequence learning of Chinese social media text summarization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017:635-640. [25] 莫鹏,胡珀,黄湘冀,等.基于超图的文本摘要与关键词协同抽取研究[J].中文信息学报,2015,29(06):135-140. [26] Xu H,Cao Y,Shang Y,et al. Adversarial reinforcement learning for Chinese text summarization[C]//Proceedings of the 18th International Conference on Computational Science. 2018:519-532. [27] Ko Y,Seo J. An effective sentence-extraction technique using contextual information and statistical approaches for text summarization[J]. Pattern Recognition Letters,2008,29(9):1366-1371. [28] Hu M,Sun A,Lim E P. Comments-oriented document summarization: understanding documents with readers feedback[C]//Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM,2008:291-298. [29] 林莉媛,王中卿,李寿山,等. 基于PageRank的中文多文档文本情感摘要[J]. 中文信息学报,2014,28(2):85-90. [30] Barzilay R,Elhadad M. Using lexical chains for text summarization[J]. Advances in automatic text summarization,1999:111-121. [31] Chen Y,Wang X,Guan Y. Automatic text summarization based on lexical chains[C]//Proceedings of the 1st International Conference on Natural Computation. Springer,2005:947-951. [32] Yu L,Ma J,Ren F,et al. Automatic text summarization based on lexical chains and structural features[C]//Proceedings of the 8th ACIS International Conference on Software Engineering,Artificial Intelligence,Networking,and Parallel/Distributed Computing,2007,2:574-578. [33] Wu X,Xie F,Wu G,et al. PNFS: personalized web news filtering and summarization[J]. International Journal on Artificial Intelligence Tools,2013,22(05):1360007. [34] Ercan G,Cicekli I. Using lexical chains for keyword extraction[J]. Information Processing & Management,2007,43(6):1705-1714. [35] Hou S,Huang Y,Fei C,et al. Holographic Lexical Chain and Its Application in Chinese Text Summarization[C]//Proceedings of the 2nd Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data. Springer,2017:266-281. [36] 王继成 ,武港山 ,周源远,等. 一种篇章结构指导的中文Web文档自动摘要方法[J].计算机研究与发展,2003,3:398-405. [37] Hu P,He T,Ji D. Chinese text summarization based on thematic area detection[C]//Proceedings of the ACL-04 Workshop: Text Summarization Branches Out Text Summarization Branches Out,2004:112-119. [38] Baumel T,Cohen R,Elhadad M. Query-chain focused summarization[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2014:913-922. [39] Blei D M. Probabilistic topic models[J]. Communications of the ACM,2012,55(4):77-84. [40] 庞超,尹传环.基于分类的中文文本摘要方法[J].计算机科学,2018,45(01):144-147,178. [41] Hsu W T,Lin C K,Lee M Y,et al. A unified model for extractive and abstractive summarization using inconsistency loss[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018:132-141. [42] Jadhav A,Rajan V. Extractive summarization with SWAP-NET: Sentences and words from alternating pointer networks[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018:142-151. [43] Lin J,Sun X,Ma S,et al. Global encoding for abstractive summarization[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018:163-169. [44] Ma S,Sun X,Lin J,et al. Autoencoder as assistant supervisor: improving text representation for Chinese social media text summarization[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018:725-731. [45] Zhou Q,Yang N,Wei F,et al. Neural document summarization by jointly learning to score and select sentences[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018:654-663. [46] Wu Y,Hu B. Learning to extract coherent summary via deep reinforcement learning[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence.2018. [47] Zhou Q,Yang N,Wei F,et al. Sequential copying networks[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018. [48] Liu L,Lu Y,Yang M,et al. Generative adversarial network for abstractive text summarization[C] //Proceedings of 32nd AAAI Conference on Artificial Intelligence. 2018. [49] Singh A K,Gupta M,Varma V. Unity in Diversity: Learning distributed heterogeneous sentence representation for extractive summarization[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018. [50] Peyrard M,Eckle-Kohler J. Supervised learning of automatic pyramid for optimization-based multi-document summarization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017:1084-1094. [51] Hirao T,Nishino M,Nagata M. Oracle summaries of compressive summarization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017:275-280. [52] Nayeem M T,Chali Y. Extract with order for coherent multi-document summarization[C]//Proceedings of TextGraphs-11:the Workshop on Graph-based Methods for Natural Language Processing. 2017:51-56. [53] Ghalandari D G. Revisiting the centroid-based method: A strong baseline for multi-document dummarization[C]//Proceedings of the EMNLP 2017 Workshop on New Frontiers in Summarization. 2017:85-90. [54] Nallapati R,Zhai F,Zhou B. SummaRuNNer: A recurrent neural network based sequence model for extractive summarization of documents[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017:3075-3081. [55] Cao Z,Li W,Li S,et al. Improving Multi-document summarization via text classification[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017:3053-3059. [56] Wan X,Yang J,Xiao J. Towards an iterative reinforcement approach for simultaneous document summarization and keyword extrac-tion[C]//Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 2007:552-559. [57] Wang K,Liu T,Sui Z,et al. Affinity preserving random walk for multi-document summarization[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017:210-220. [58] Li P,Lam W,Bing L,et al. Cascaded attention based unsupervised information distillation for compressive summarization[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017:2081-2090. [59] Li P,Lam W,Bing L,et al. Deep recurrent generative decoder for abstractive text summarization[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017:2091-2100. [60] Isonuma M,Fujino T,Mori J,et al. Extractive summarization using multi-task learning with document classification[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017:2101-2110. [61] Parveen D,Mesgar M,Strube M. Generating coherent summaries of scientific articles using coherence pa-tterns[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016:772-783. [62] Filippova K,Mieskes M,Nastase V,et al. Cascaded filtering for topicdriven multi-document summarization[C]//Proceedings of the 7th Document Understanding Conference. 2007:26-27. [63] Kurisinkel L J,Zhang Y,Varma V. Abstractive Multi-document summarization by partial tree extraction,recombination and linearization[C]//Proceedings of the 8th International Joint Conference on Natural Language Processing. 2017:812-821. [64] Chali Y,Tanvee M,Nayeem M T. Towards abstractive Multi-document summarization using submodular function-based framework,sentence compression and merging[C]//Proceedings of the 8th International Joint Conference on Natural Language Processing. 2017:418-424. [65] Peyrard M,Eckle-Kohler J. A general optimization framework for Multi-document summarization using genetic algorithms and swarm intelligence[C]//Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers. 2016:247-257. [66] Wang X,Nishino M,Hirao T,et al. Exploring text links for coherent multi-document summarization [C]//Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers. 2016:213-223. [67] Li W,He L,Zhuge H. Abstractive news summarization based on event semantic link network[C]//Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers. 2016:236-246. [68] Wong K F,Wu M,Li W. Extractive summarization using supervised and semi-supervised learning[C]//Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics,2008:985-992. [69] Zhang R,Li W,Liu N,et al. Coherent narrative summarization with a cognitive model[J].Computer Speech & Language,2016,35:134-160. [70] Filatova E,Hatzivassiloglou V. Event-based extractive summarization[C]//Proceedings of Text Summarization Branches Out,2004. [71] Parveen D,Strube M. Multi-document summarization using bipartite graphs[C]//Proceedings of TextGraphs-9:the workshop on Graph-based Methods for Natural Language Processing. 2014:15-24. [72] McDonald R. A study of global inference algorithms in multi-document summarization[C]//Proceedings of the 29th European Conference on Information Retrieval. Berlin:Springer,Heidelberg,2007:557-564. [73] Tang J,Yao L,Chen D. Multi-topic based query-oriented summarization[C]//Proceedings of the 2009 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics,2009:1148-1159. [74] Lin C Y. Rouge: A package for automatic evaluation of summaries[C]//Proceedings of the ACL-04 Workshop: Text Summarization Branches Out,2004. [75] Yang Y S,Zhang M,Chen W,et al. Adversarial Learning for Chinese NER from Crowd Annotations[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018. |
|
|
|