Abstract:A topic sentence extraction method for news text is proposed. Firstly, the location feature is derived from the distribution of news topic sentence in the text. Then, the overlap ratio between a sentence and the title calculated owing to the interrelation of the news title with the theme. To best estimate the relevancy between the title and the candidate topic sentence, a maximum matching based on weighted bipartite graph is applied. Finally, the topic sentence is selected according to the sentence rank score. The experimental results show that the proposed method reaches 75.9% in P@1, and 92.4% in P@3.
[1] Ogura Y, Kobayashi I. Text Classification based on the Latent Topics of Important Sentences extracted by the PageRank Algorithm[C]//Proceedings of the ACL student research workshop, 2013:46-51. [2] Jung W, Ko Y, Seo J, et al. Automatic text summarization using two-step sentence extraction[C]//Proceedings of the asia information retrieval symposium, 2004:71-81. [3] Zuo J, Wang M, Wan J, et al. Information Retrieval Model Combining Sentence Level Retrieval[C]//Proceedings of the international conference on asian language processing. IEEE, 2013:37-40. [4] You J, ZhangY, Tong Y. An Approach to Sentiment Analysis for Chinese News Text Based on Topic Sentences Extraction[C]//Proceedings of the international journal of knowledge and language processing, 2014:20-31. [5] 王力, 李培峰, 朱巧明. 一种基于LDA模型的主题句抽取方法[J]. 计算机工程与应用, 2013, (2):160-164. [6] 原田, 宗樹, 柳本等. Topic Sentence Extraction from Editorial Articles Based on Sentence Structure and Topic Relevance[J]. システム制御情報学会研究発表講演会講演論文集, 2013, 57. [7] 张云涛, 龚玲, 王永成. 基于综合方法的文本主题句的自动抽取[J]. 上海交通大学学报, 2006, 40(5):771-774. [8] 葛斌, 李芳芳, 李阜等. 基于无向图构建策略的主题句抽取[J]. 计算机科学, 2011, 38(5):181-185. [9] Yeh J, Ke H, Yang W. iSpreadRank:Ranking sentences for extraction-based summarization using feature weight propagation in the sentence similarity network[J]. Expert Systems with Applications, 2008, 35(3):1451-1462. [10] Wang C, Zhang M, Ru L, et al. An Automatic Online News Topic Keyphrase Extraction System[C]//Proceedings of the IEEE/WIC/ACM international conference on Web intelligence and intelligent agent technology. IEEE Computer Society, 2008:214-219. [11] Yin Z H, Wang Y C, Cai W, et al. Extracting subject from internet news by string match[J]. Journal of Software, 2002, 13(2):159-167. [12] Kastner I, Monz C. Automatic single-document key fact extraction from newswire articles[C]//Proceedings of the conference of the european chapter of the association for computational linguistics. Association for Computational Linguistics, 2009:415-423. [13] 王伟, 赵东岩, 赵伟. 中文新闻关键事件的主题句识别[J]. 北京大学学报:自然科学版, 2011, 47(5):789-796. [14] 张彦荣. 试论新闻标题的制作技巧[J]. 青海师范大学学报:哲学社会科学版, 2011, (4):147-149. [15] Farhady H. Location of the Topic Sentence, Level of Language Proficiency, and Reading Comprehension[J]. Iranian Efl Journal, 1999:308-318. [16] Dorr, Bonnie, Zajic, et al. Hedge Trimmer:a parse-and-trim approach to headline generation[C]//Proceedings of the north American Chapter of the Association for Computational Linguistics, 2003:1-8. [17] Deng X. Cultural Interpretation of Online News Title Party[J]. Journal of Guangzhou Open University, 2012:71-79.