涉案舆情新闻文本摘要任务是从涉及特定案件的舆情新闻文本中,获取重要信息作为其简短摘要,因此对于相关人员快速掌控舆情态势具有重要作用。涉案舆情新闻文本摘要相比开放域文本摘要任务,通常涉及特定的案件要素,这些要素对摘要生成过程有重要的指导作用。因此,该文结合深度学习框架,提出了一种融入案件要素的涉案舆情新闻文本摘要方法。首先构建涉案舆情新闻摘要数据集并定义相关案件要素,然后通过注意力机制将案件要素信息融入新闻文本的词、句子双层编码过程中,生成带有案件要素信息的新闻文本表征,最后利用多特征分类层对句子进行分类。为了验证算法有效性,在构造的涉案舆情新闻摘要数据集上进行实验。实验结果表明,该方法相比基准模型取得了更好的效果,具有有效性和先进性。
Abstract
The summary task of the public opinion news on a judical case is to obtain important information on public comments on the case in a short summary. Compared with the task of text summarization in open domain, this kind of summary usually involves specific case elements that are of great guiding effect in the process of summary generation. Therefore, a case-related news text summarization method is proposed based on deep learning framework. First, a dataset of the public opinion news summary is collected, and the case elements are defined. Then, through the attention mechanism, the case element information is integrated into the double-layer coding process of words and sentences in the news text to generate the news text representation that contains the case element information. Finally, the multi-feature classification layer is used to classify the sentences. Experiments are conducted on the public opinion news summary datasetand show that the proposed method has better performance than the base model.
关键词
涉案舆情摘要 /
案件要素 /
双层编码 /
多特征分类
{{custom_keyword}} /
Key words
summary of grievances involving cases /
case elements /
two-layer encoding /
multi-feature classification
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Kaikhah K. Automatic text summarization with neural networks[C]//Proceedings of the 2nd International IEEE Conference on Intelligent Systems. Varna, Bulgaria: IEEE Press, 2004, 1: 40-44.
[2] Joshi M, Wang H, McClean S. Generating object-oriented semantic graph for text summarization in mining intelligence and knowledge exploration[M].New York, USA: Springer Press, 2014: 298-311.
[3] Hingu D, Shah D, Udmale S S. Automatic text summarization of wikipedia articles[C]//Proceedings of the 2015 International Conference on Communication, Information & Computing Technology (ICCICT).Singapore: IEEE Press, 2015: 1-4.
[4] Yao J, Wan X, Xiao J. Recent advances in document summarization[J]. Knowledge and Information Systems, 2017, 53(2): 297-336.
[5] Baxendale P B. Machine-made index for technical literature: An experiment[J].IBM Journal of Research and Development, 1958, 2(4): 354-361.
[6] Edmundson H P. New methods in automatic extracting[J]. Journal of the ACM (JACM), 1969, 16(2): 264-285.
[7] Yohei Seki. Sentence extraction by tf·idf and position weighting from newspaper articles [C]//Proceedings of the 3rd NTCIR Workshop on Research in Information Retrieval. Automatic Text Summarization and Question Answering .Tokyo: NII. 2002:55-59.
[8] Brin S, Page L. The anatomy of a large-scale hypertextual web search engine[J]. Computer Networks and ISDN Systems, 1998, 30(1-7): 107-117.
[9] Mihalcea R, Tarau P. TextRank: Bringing order into text[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Barcelona, Spain: ACL Press.2004: 404-411.
[10] Sankarasubramaniam Y, Ramanathan K, Ghosh S. Text summarization using Wikipedia[J]. Information Processing & Management, 2014, 50(3): 443-461.
[11] Cheng J, Lapata M. Neural summarization by extracting sentences and words[J]. arXiv preprint arXiv:1603.07252, 2016.
[12] Nallapati R, Zhai F, Zhou B. Summarunner: A recurrent neural network based sequence model for ex-
tractive summarization of documents[C]//Proceedings of 31st AAAI Conference on Artificial Intelligence. San Francisco, California USA: AAAI Press,2017.3075-3081.
[13] Wang L, Yao J, Tao Y, et al. A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization[J]. arXiv preprint arXiv:1805.03616, 2018.
[14] 王振超, 孙锐, 姬东鸿. 基于事件指导的多文档生成式摘要方法[J]. 计算机应用研究,2017, 34(2):343-346,356.
[15] Cao Z, Chen C, Li W, et al. Tgsum: Build tweet guided multi-document summarization dataset[C]//Proceedings of the 13th AAAI Conference on Artificial Intelligence. Phoenix, Arizona: AAAI Press.2016:2906-2912.
[16] Svore K, Vanderwende L, Burges C. Enhancing single-document summarization by combining RankNet and third-party sources[C]//Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning.Prague.Czech Republic: EMNLP, ACL Press, 2007: 448-457.
[17] Lin C Y. Rouge: A package for automatic evaluation of summaries[J].Text Summarization Branches Out, 2004: 74-81.
[18] Wang C, Zhang M, Ma S, et al. Automatic online news issue construction in web environment[C]//Proceedings of the 17th International Conference on World Wide Web. Beijing, China: ACM Press,2008: 457-466.
[19] Lin H, Bilmes J. Multi-document summarization via budgeted maximization of submodular functions[C]//Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Los Angeles CA USA: NAACL, ACL Press, 2010: 912-920.
[20] Erkan G, Radev D R. Lexpagerank: Prestige in multi-document text summarization[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Barcelona, Spain: ACL Press, 2004: 365-371.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家重点研发计划(2018YFC0830105,2018YFC0830101,2018YFC0830100);云南省高新技术产业专项(201606)
{{custom_fund}}