In order to generate a summary for a news event reported in both Chinese and Vietnamese, a multi-feature fusion method for bilingual news summarization is proposed. It employs the cross-lingual correlations between sentences in the news text. Firstly, this method analyzes the co-occurrence degree of news elements and the similarity between sentences. Then, these two features are integrated into an undirected graph and a ranking algorithm is used to sort sentences. Finally, important sentences are selected and the redundancy is removed to generate a summary. Experiment on the Chinese and Vietnamese bilingual news archive shows that the proposed method achieved good results.
双语新闻 /
多特征 /
句子无向图 /
{{custom_keyword}} /
Key words
bilingual news /
multi-feature /
undirected sentence graph /
automatic summarization
{{custom_keyword}} /
[1] Luhn H P.The automatic creation of literature abstracts[J].IBM Journal of Research and Development,1958,2(2): 159-165.
[2] Baxendale P B.Machine-made index for technical literature: An experiment[J].IBM Journal of Research and Development,1958,2(4): 354-361.
[3] Edmundson H P.New methods in automatic extracting[J].Journal of the ACM,1969,16(2):264-285.
[4] Kupiec J,Pedersen J,Chen F.A trainable document summarizer[C]//Proceedings of the 18th annual International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,1995:68-73.
[5] Lin C Y.Training a selection function for extraction[C]//Proceedings of the .Eighth International Conference on Information and Knowledge Management.ACM,1999:55-62.
[6] Conroy J M,O’Leary D P.Text summarization via hidden Markov models[C]//Proceedings of the 24th amnual International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2001:406-407.
[7] Mihalcea R.Graph-based ranking algorithms for sentence extraction,applied to text summarization[C]//Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions.Association for Computational Linguistics,2004:20.
[8] Morales L P.Concept-graph based biomedical automatic summarization using ontologies[C]//Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing.Association for Computational Linguistics,2008:53-56.
[9] Ferreira R,Freitas F,Cabral L D S.A four dimension graph model for automatic text summarization[C]//Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence.IEEE,2013:389-396.
[10] Ferreira R,Lins R D,Freitas F.A new sentence similarity method based on a three-layer sentence representation[C]//Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence.IEEE Computer Society,2014:110-117.
[11] Evans D K,McKeown K,Klavans J L.Similarity-based multilingual multi-document summarization[R].Technical Report CUCS-014-05,cOUNBIA uNIVERSITY.
[12] Wan X,et al.Summarizing the differences in multilingual news[C]//Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2011: 735-744.
[13] 刘书龙.汉越双语新闻观点句抽取及分析方法研究[D].昆明: 昆明理工大学硕士学位论文,2017.
[14] Faruqui M,Dyer C.Improving vector space word representations using multilingual correlation[C]// Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics,2014: 462-471.
[15] Kusner M J,et al.From word embeddings to document distances[C]//Proceedings of the 32rd International Conference on Machine Learning,2015:957-966.
[16] Lin C Y,Hovy E.Identifying topics by position[C]//Proceedings of the 5th conference on Applied Natural Language Processing.Association for Computational Linguistics,1997: 283-290.
[17] Lin C Y,Hovy E.Automatic evaluation of summaries using N-gram co-occurrence statistics[C]//Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology.Association for Computational Linguistics,2003:71-78.