该文提出一种融入多特征的汉越双语新闻观点句抽取方法。首先针对汉语和越南语标记资源不平衡的问题,构建了汉越双语词嵌入模型,用丰富的中文标记资源来弥补越南语标记资源的缺失。并且该文认为句子的主题特征、位置特征和情感特征对观点句分类具有重要作用,因此将这些特征分别融入词向量和注意力机制中,实现句子语义信息和情感、主题、位置特征的结合。实验表明,该方法可有效提升越南语新闻观点句抽取的准确率。
Abstract
This paper proposes a Chinese-Vietnamese bilingual news perspective sentence extraction method that incorporates multiple features. Firstly, for the problem of unbalanced resources in Chinese and Vietnamese, this method constructs a Chinese-Vietnamese bilingual word embedding model. We use rich Chinese tag resources to make up for the lack of Vietnamese tagging resources. Then, the emotional, topical and positional features of sentences are integrated into the word vector and attention mechanism respectively. Experiments show that this method can effectively improve the accuracy of Vietnamese news perspective sentence extraction.
关键词
观点句抽取 /
双语词嵌入 /
注意力机制
{{custom_keyword}} /
Key words
perspective sentence extraction /
bilingual word embedding /
attention mechanism
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Zhou X J, Wan X J, Xiao J G. Attention-based LSTM network for cross-lingual sentiment classification[C]//Proceedings of the EMNLP, 2016: 247-256.
[2] Zhou X J, Wan X J, Xiao J G. Cross-lingual sentiment classification with bilingual document representation learning[C]//Proceedings of the ACL,2016: 1403-1412.
[3] Zhou H W, Yang Y L, Liu Z, et al. Jointly learning bilingual sentiment and semantic representations for cross-language sentiment classification[C]//Proceedings of the CCIR,2017: 149-160.
[4] Mikolov T, Le Q V, Sutskever I. Exploiting similarities among languages for machine translation[C]//Proceedings of the International Conference on Learning Representations,2013a.
[5] Faruqui M, Dyer C. Improving vector space word representations using multilingual correlation[C]//Proceedings of the EACL,2014: 462-471.
[6] Klementiev A, Titov I, Bhattarai B. Inducing crosslingual distributed representations of words[C]//Proceedings of the International Conference on Computational Linguistics,2012: 1459-1474.
[7] Sarath C, Lauly S, Larochelle H, et al. An autoencoder approach to learning bilingual word representations[C]//Proceedings of the NIPS,2014: 1853-1861.
[8] 刘培玉,荀静,费绍栋,等. 基于隐马尔可夫模型的主观句识别[J]. 中文信息学报,2016,30(4): 206-212.
[9] 赵虹杰,刘华丽,任巨伟,等. 面向新闻的情感关键句抽取与极性判别[J].山西大学学报(自然科学版),2014(4): 588-594.
[10] Kim S M,Hovy E. Determining the sentiment of opinions[C]//Proceedings of the 20th International Conference on Computational Linguistics,2004: 1367.
[11] 邵帅,刘学军,李斌. 基于关键句分析的微博情感倾向性[J]. 计算机应用研究,2018,35(4): 982-987.
[12] Riloff E. Automatically generating extraction patterns from untagged text[C]//Proceedings of the 13th National Conference on Artificial Intelligence,1996: 1044-1049.
[13] 刘荣,郝晓燕,李颖. 基于语义模式的半监督中文观点句识别研究[J]. 南京大学学报(自然科学版),2018(5): 967-973.
[14] 田海龙,朱艳辉,梁韬,等. 基于三支决策的中文微博观点句识别研究[J]. 山东大学学报(理学版),2014(8): 58-65.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家重点研发计划(2018YFC0830105,2018YFC0830100);国家自然科学基金(61732005,61672271,61761026,61762056,61866020);云南省高新技术产业专项(201606);云南省自然科学基金(2018FB104);云南省科技人才培养项目(KKSY201703015)
{{custom_fund}}