1.School of Information Engineering, Minzu University of China, Beijing 100081, China; 2.Minority Languages Branch, National Language Resource and Monitoring Research Center, Minzu University of China, Beijing 100081, China
Abstract:For Tibetan text abstraction, this paper proposes an improved TextRank for Tibetan extractive summarization. This method integrates the information of the external corpus into the TextRank algorithm in the form of word vector. The sentence is represented by each word vector, which means sentence vector is applied for sentence scoring. We select the sentences with the highest scores and reorder them as a summary of the text. The experimental results demonstrate that the method can effectively improve the quality of the abstract according the ROUGE evaluation method.
[1] Mani I. Advances in automatic text summarization[M]. MIT Press, 1999:8-11. [2] 宗成庆.统计自然语言处理[M]. 北京: 清华大学出版社,2013: 455-470. [3] 洪冬梅. 基于LSTM的自动文本摘要技术研究[D].广州: 华南理工大学硕士学位论文,2018. [4] Cheng J,Lapata M. Neural summarization by extracting sentences and words[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,2016: 484-494. [5] Radev D R, Allison T, Blairgoldensohn S, et al. MEAD - A platform for multidocument multilingual text summarization[C]//Proceedings of the Intermational Conference on Language Resources and Evaluation,2004: 699-702. [6] Woodsend K, Lapata M. Automatic generation of story highlights[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010: 565-74. [7] Luhn H P. The automatic creation of literature abstracts[J]. IBM Journal of Research and Development, 1958: 159-165. [8] Brandow R, Mitze K, Rau L F. Automatic condensation of electronic publications by sentence selection[J]. Information Processing & Management, 1995, 31(5): 675-685. [9] Mihalcea R, Tarau P. TextRank: Bringing order into text[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 2004: 404-411. [10] 孙国超. 基于LDA主题模型的Web文本自动文摘系统的研究与实现[D]. 济南: 山东科技大学硕士学位论文,2017. [11] 解艳. 基于LSA和段落聚类的自动文摘系统的研究[D]. 沈阳: 辽宁科技大学硕士学位论文,2012. [12] Robert Dale. NLP commercialisation in the last 25 years[J]. Natural Language Engineering,2019,25(3): 419-426. [13] Nallapati R, Zhai F, Zhou B. SummaRuNNer: A recurrent neural network based sequence model for extractive summarization of documents[C]//Proceedings of the 31st AAAI Conference on Arificial Intelligence. 2017: 3075-3081. [14] Yin W, Pei Y. Optimizing sentence modeling and selection for document summarization[C]////Proceedings of the International Conference on Artificial Intelligence. AAAI Press, 2015: 1383-1389. [15] Fang C,Mu D,Deng Z,et al.Word-sentence co-ranking for automatic extractive text summarization[J]. Expert System with Applicatons, 2017,72: 189-195. [16] 安见才让.藏文搜索引擎系统中网页自动摘要的研究[J].微处理机,2010,31(05):77-80. [17] 南奎娘若,安见才让.基于敏感信息的藏文文本摘要提取的研究[J].网络安全技术与应用,2016(04):58-59. [18] 南奎娘若. 基于特征信息提取的藏文自动文摘研究[D]. 青海: 青海民族大学硕士学位论文,2016. [19] Papineni K, Roukos S, Ward T, et al. BLEU: A method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002: 311-318. [20] Lin C Y. Rouge: A package for automatic evaluation of summaries[C]//Proceedings of the Workshop on Text Summarization Branches Out, 2004: 74-81. [21] 李博涵,刘汇丹,龙从军,等.基于深度学习的藏文分词方法[J].计算机工程与设计,2018,39(01):194-198.