为了获取同一事件的汉越双语新闻的自动摘要,该文提出了一种多特征融合的汉越双语新闻摘要方法。关于同一事件的新闻文本,其句子间具有一定的关联关系,利用这些关联关系有助于生成摘要。根据该思想,首先计算句子间的新闻要素共现程度及句子间的相似度;然后将这两种特征融入句子无向图,并利用图排序算法对句子进行排序;之后结合句子的位置特征对排序结果进行调序;最后挑选重要句子并去除冗余生成摘要。在汉越双语新闻文档集上进行了摘要实验,结果表明该方法取得了较好的结果,具有有效性。
Abstract
In order to generate a summary for a news event reported in both Chinese and Vietnamese, a multi-feature fusion method for bilingual news summarization is proposed. It employs the cross-lingual correlations between sentences in the news text. Firstly, this method analyzes the co-occurrence degree of news elements and the similarity between sentences. Then, these two features are integrated into an undirected graph and a ranking algorithm is used to sort sentences. Finally, important sentences are selected and the redundancy is removed to generate a summary. Experiment on the Chinese and Vietnamese bilingual news archive shows that the proposed method achieved good results.
关键词
双语新闻 /
多特征 /
句子无向图 /
自动摘要
{{custom_keyword}} /
Key words
bilingual news /
multi-feature /
undirected sentence graph /
automatic summarization
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Luhn H P.The automatic creation of literature abstracts[J].IBM Journal of Research and Development,1958,2(2): 159-165.
[2] Baxendale P B.Machine-made index for technical literature: An experiment[J].IBM Journal of Research and Development,1958,2(4): 354-361.
[3] Edmundson H P.New methods in automatic extracting[J].Journal of the ACM,1969,16(2):264-285.
[4] Kupiec J,Pedersen J,Chen F.A trainable document summarizer[C]//Proceedings of the 18th annual International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,1995:68-73.
[5] Lin C Y.Training a selection function for extraction[C]//Proceedings of the .Eighth International Conference on Information and Knowledge Management.ACM,1999:55-62.
[6] Conroy J M,O’Leary D P.Text summarization via hidden Markov models[C]//Proceedings of the 24th amnual International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2001:406-407.
[7] Mihalcea R.Graph-based ranking algorithms for sentence extraction,applied to text summarization[C]//Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions.Association for Computational Linguistics,2004:20.
[8] Morales L P.Concept-graph based biomedical automatic summarization using ontologies[C]//Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing.Association for Computational Linguistics,2008:53-56.
[9] Ferreira R,Freitas F,Cabral L D S.A four dimension graph model for automatic text summarization[C]//Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence.IEEE,2013:389-396.
[10] Ferreira R,Lins R D,Freitas F.A new sentence similarity method based on a three-layer sentence representation[C]//Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence.IEEE Computer Society,2014:110-117.
[11] Evans D K,McKeown K,Klavans J L.Similarity-based multilingual multi-document summarization[R].Technical Report CUCS-014-05,cOUNBIA uNIVERSITY.
[12] Wan X,et al.Summarizing the differences in multilingual news[C]//Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2011: 735-744.
[13] 刘书龙.汉越双语新闻观点句抽取及分析方法研究[D].昆明: 昆明理工大学硕士学位论文,2017.
[14] Faruqui M,Dyer C.Improving vector space word representations using multilingual correlation[C]// Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics,2014: 462-471.
[15] Kusner M J,et al.From word embeddings to document distances[C]//Proceedings of the 32rd International Conference on Machine Learning,2015:957-966.
[16] Lin C Y,Hovy E.Identifying topics by position[C]//Proceedings of the 5th conference on Applied Natural Language Processing.Association for Computational Linguistics,1997: 283-290.
[17] Lin C Y,Hovy E.Automatic evaluation of summaries using N-gram co-occurrence statistics[C]//Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology.Association for Computational Linguistics,2003:71-78.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家自然科学基金(61472168,61761026,61732005,61672271,61762056);云南省高新技术产业专项(201606);云南省科技创新人才基金(2014HE001);云南省自然科学基金(2018FB104)
{{custom_fund}}