根据新闻文本的特点,分别对新闻标题与正文进行分析,该文提出了一种针对新闻文本的特征加权的主题句抽取方法。首先对新闻主题句在文本中的分布情况进行分析,选取了位置特征;然后根据新闻标题对于新闻主旨的提示作用,选取了标题句子重合度与关联度的特征,且在关联度特征中将基于加权二部图的最大匹配算法融入其中;最后依据句子的得分排名,进行主题句抽取。实验显示,利用该方法进行主题句抽取的P@1为75.9%,P@3 达到92.4%。
Abstract
A topic sentence extraction method for news text is proposed. Firstly, the location feature is derived from the distribution of news topic sentence in the text. Then, the overlap ratio between a sentence and the title calculated owing to the interrelation of the news title with the theme. To best estimate the relevancy between the title and the candidate topic sentence, a maximum matching based on weighted bipartite graph is applied. Finally, the topic sentence is selected according to the sentence rank score. The experimental results show that the proposed method reaches 75.9% in P@1, and 92.4% in P@3.
关键词
特征加权 /
重合度 /
关联度 /
加权二部图
{{custom_keyword}} /
Key words
feature weighted /
overlap ratio /
relevancy degree /
weighted bipartite graph
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Ogura Y, Kobayashi I. Text Classification based on the Latent Topics of Important Sentences extracted by the PageRank Algorithm[C]//Proceedings of the ACL student research workshop, 2013:46-51.
[2] Jung W, Ko Y, Seo J, et al. Automatic text summarization using two-step sentence extraction[C]//Proceedings of the asia information retrieval symposium, 2004:71-81.
[3] Zuo J, Wang M, Wan J, et al. Information Retrieval Model Combining Sentence Level Retrieval[C]//Proceedings of the international conference on asian language processing. IEEE, 2013:37-40.
[4] You J, ZhangY, Tong Y. An Approach to Sentiment Analysis for Chinese News Text Based on Topic Sentences Extraction[C]//Proceedings of the international journal of knowledge and language processing, 2014:20-31.
[5] 王力, 李培峰, 朱巧明. 一种基于LDA模型的主题句抽取方法[J]. 计算机工程与应用, 2013, (2):160-164.
[6] 原田, 宗樹, 柳本等. Topic Sentence Extraction from Editorial Articles Based on Sentence Structure and Topic Relevance[J]. システム制御情報学会研究発表講演会講演論文集, 2013, 57.
[7] 张云涛, 龚玲, 王永成. 基于综合方法的文本主题句的自动抽取[J]. 上海交通大学学报, 2006, 40(5):771-774.
[8] 葛斌, 李芳芳, 李阜等. 基于无向图构建策略的主题句抽取[J]. 计算机科学, 2011, 38(5):181-185.
[9] Yeh J, Ke H, Yang W. iSpreadRank:Ranking sentences for extraction-based summarization using feature weight propagation in the sentence similarity network[J]. Expert Systems with Applications, 2008, 35(3):1451-1462.
[10] Wang C, Zhang M, Ru L, et al. An Automatic Online News Topic Keyphrase Extraction System[C]//Proceedings of the IEEE/WIC/ACM international conference on Web intelligence and intelligent agent technology. IEEE Computer Society, 2008:214-219.
[11] Yin Z H, Wang Y C, Cai W, et al. Extracting subject from internet news by string match[J]. Journal of Software, 2002, 13(2):159-167.
[12] Kastner I, Monz C. Automatic single-document key fact extraction from newswire articles[C]//Proceedings of the conference of the european chapter of the association for computational linguistics. Association for Computational Linguistics, 2009:415-423.
[13] 王伟, 赵东岩, 赵伟. 中文新闻关键事件的主题句识别[J]. 北京大学学报:自然科学版, 2011, 47(5):789-796.
[14] 张彦荣. 试论新闻标题的制作技巧[J]. 青海师范大学学报:哲学社会科学版, 2011, (4):147-149.
[15] Farhady H. Location of the Topic Sentence, Level of Language Proficiency, and Reading Comprehension[J]. Iranian Efl Journal, 1999:308-318.
[16] Dorr, Bonnie, Zajic, et al. Hedge Trimmer:a parse-and-trim approach to headline generation[C]//Proceedings of the north American Chapter of the Association for Computational Linguistics, 2003:1-8.
[17] Deng X. Cultural Interpretation of Online News Title Party[J]. Journal of Guangzhou Open University, 2012:71-79.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
沈阳省自然科学基金(20170540696);沈阳市科技计划项目(17-231-1-82)
{{custom_fund}}