为了从海量的热点事件社交媒体文本流中提取时序摘要,辅助用户快速获取热点事件的演化发展概况,该文在分析热点事件发展阶段的基础上,充分挖掘社交文本的时间特征和传播特征,提出了融合社交传播影响力的热点事件时序摘要方法。该方法抽取的摘要能完整反映事件发展演化过程,内容描述更合乎客观事实,同时在一定程度上解决了社交文本非结构化问题对文本句子权重度量造成的不利影响。实验结果显示,当时间与传播特征权重比值达到0.4时,该方法得到的摘要结果的ROUGE评测值达到最佳,ROUGE-1最优达到44.23%,ROUGE-2最优达到34.78%,ROUGE-S4最优达到27.86%。实验结果表明,基于时间线的文本组织能有效追踪事件发展演化过程,融入时序信息和传播影响力后的时序摘要更能提升热点事件概况的新颖度和相关度。
Abstract
This paper designs a new temporal summarization extraction method to improve the model of the evolution overview of numerous hot events in social media. Based on the analysis of the hot events evolution stage, this study explores the time-series and social influence of social texts and proposes a novel temporal summarization method, LexRank Summarization with Timeline-Social Influence (LSTS). Experimental results showed that LSTS achieves optimal results with a weight ratio 0.4 between the time-series and social influence, reading 44.23%, 34.78%, 27.86% according to ROUGE-1, ROUGE-2 and ROUGE-S4, respectively.
关键词
热点事件 /
时序摘要 /
演化阶段 /
时序特征 /
传播影响力
{{custom_keyword}} /
Key words
hot events /
temporal summarization /
evolution stage /
time-series feature /
social influence
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 李航,唐超兰,杨贤,等.融合多特征的TextRank关键词抽取方法[J].情报杂志,2017,36(8): 183-187.
[2] 席耀一,李弼程,李天彩,等.基于词语对狄利克雷过程的时序摘要[J].自动化学报,2015,41(8): 1452-1469.
[3] Schubotz T, Krestel R. Online temporal summarization of news events[C]//Proceedings of the 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology(WI-IAT). New York: ACM Press, 2015: 409-412.
[4] Sarmento R P, Cordeiro M, Brazdil P, et al. Incremental TextRank: Automatic keyword extraction for text streams [C]//Proceedings of the 20th International Conference on Enterprise Information Systems. SciTePress, 2018: 363-370.
[5] Hu P, Huang M, Xu P, et al. Generating breakpoint-based timeline overview for news topic retrospection[C]//Proceedings of the 11th IEEE International Conference on Data Mining. Piscataway, NJ: IEEE Press, 2011: 260-269.
[6] 吴仁守,刘凯,王红玲.一种基于局部—全局主题关系的演化式摘要系统[J].中文信息学报,2018,32(9): 75-83.
[7] 蒋珊.基于微博的事件演化分析及摘要抽取方法研究 [D].四川: 电子科技大学硕士学位论文,2019.
[8] Chang Y, Wang X, Mei Q, et al. Towards Twitter context summarization with user influence models[C]//Proceedings of the 6th ACM International Conference on Web seavch and data mining, 2013: 527-536
[9] 于广川,贺瑞芳,刘洋,等.融合语境分析等时序推特摘要方法[J].软件学报,2017,28(10): 2654-2673.
[10] 晏敬东,杨彩霞,张炜南.基于生命周期理论的微博舆情引控研究[J].情报杂志,2017,36(8): 88-93,75
[11] 陈福集,张燕.基于E-Divisive的网络舆情演化分析[J].情报杂志,2016,35(04): 75-79.
[12] James N A,Kejariwal A, Matteson D S. Leveraging cloud data to mitigate user experience from “breaking bad”[C]//Proceedings of the 2016 IEEE International Conference on Big Data. Piscataway, NJ: IEEE Press, 2016: 3499-3508.
[13] Székely G J, Rizzo M L. Energy statistics: A class of statistics based on distances[J]. Journal of Statistical Planning and Inference, 2013, 143(8): 1249-1272.
[14] Erkan G, Radev D R. LexRank: Graph-based lexical centrality as salience in text summarization[J]. Journal of Artificial Intelligence Research, 2004, 22:457-479.
[15] Saaty T. Fundamentals of the analytic network process-multiple networks with benefits[J]. Journal of System Science and System Engineering, 2004, 34(2): 128-137.
[16] Vanderwende L, Suzuki H, Brockett C, et al. Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion[J]. Information Processing & Management, 2007, 3(6): 1606-1618.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
教育部人文社会科学研究规划基金(18YJAZH087);武汉理工大学自主创新研究基金(3120600100)
{{custom_fund}}