Abstract:This paper proposes a Chinese-Vietnamese bilingual news perspective sentence extraction method that incorporates multiple features. Firstly, for the problem of unbalanced resources in Chinese and Vietnamese, this method constructs a Chinese-Vietnamese bilingual word embedding model. We use rich Chinese tag resources to make up for the lack of Vietnamese tagging resources. Then, the emotional, topical and positional features of sentences are integrated into the word vector and attention mechanism respectively. Experiments show that this method can effectively improve the accuracy of Vietnamese news perspective sentence extraction.
[1] Zhou X J, Wan X J, Xiao J G. Attention-based LSTM network for cross-lingual sentiment classification[C]//Proceedings of the EMNLP, 2016: 247-256. [2] Zhou X J, Wan X J, Xiao J G. Cross-lingual sentiment classification with bilingual document representation learning[C]//Proceedings of the ACL,2016: 1403-1412. [3] Zhou H W, Yang Y L, Liu Z, et al. Jointly learning bilingual sentiment and semantic representations for cross-language sentiment classification[C]//Proceedings of the CCIR,2017: 149-160. [4] Mikolov T, Le Q V, Sutskever I. Exploiting similarities among languages for machine translation[C]//Proceedings of the International Conference on Learning Representations,2013a. [5] Faruqui M, Dyer C. Improving vector space word representations using multilingual correlation[C]//Proceedings of the EACL,2014: 462-471. [6] Klementiev A, Titov I, Bhattarai B. Inducing crosslingual distributed representations of words[C]//Proceedings of the International Conference on Computational Linguistics,2012: 1459-1474. [7] Sarath C, Lauly S, Larochelle H, et al. An autoencoder approach to learning bilingual word representations[C]//Proceedings of the NIPS,2014: 1853-1861. [8] 刘培玉,荀静,费绍栋,等. 基于隐马尔可夫模型的主观句识别[J]. 中文信息学报,2016,30(4): 206-212. [9] 赵虹杰,刘华丽,任巨伟,等. 面向新闻的情感关键句抽取与极性判别[J].山西大学学报(自然科学版),2014(4): 588-594. [10] Kim S M,Hovy E. Determining the sentiment of opinions[C]//Proceedings of the 20th International Conference on Computational Linguistics,2004: 1367. [11] 邵帅,刘学军,李斌. 基于关键句分析的微博情感倾向性[J]. 计算机应用研究,2018,35(4): 982-987. [12] Riloff E. Automatically generating extraction patterns from untagged text[C]//Proceedings of the 13th National Conference on Artificial Intelligence,1996: 1044-1049. [13] 刘荣,郝晓燕,李颖. 基于语义模式的半监督中文观点句识别研究[J]. 南京大学学报(自然科学版),2018(5): 967-973. [14] 田海龙,朱艳辉,梁韬,等. 基于三支决策的中文微博观点句识别研究[J]. 山东大学学报(理学版),2014(8): 58-65.