HRTNSC: 基于混合表示的藏文新闻主客观句子分类模型

孔春伟,吕学强,张乐

PDF(2056 KB)
PDF(2056 KB)
中文信息学报 ›› 2022, Vol. 36 ›› Issue (12) : 94-103,114.
民族、跨境及周边语言信息处理

HRTNSC: 基于混合表示的藏文新闻主客观句子分类模型

  • 孔春伟1,2,吕学强1,2,张乐2
作者信息 +

HRTNSC: Hybrid Representation-based Subjective and Objective Sentence Classification for Tibetan News

  • KONG Chunwei1,2 , LYU Xueqiang1,2, ZHANG Le2
Author information +
History +

摘要

针对藏文新闻主客观分类的现实需求,该文以藏文新闻文本数据为研究对象,提出一种基于混合表示的藏文新闻主客观句子分类模型(HRTNSC)。首先通过融合音节级特征和包含当前音节的单词级特征丰富模型输入的语义信息,然后将融合后的特征向量输入到BiLSTM+CNN网络中进行语义提取,最后采用Softmax分类器实现句子的主客观分类。测试结果表明,HRTNSC模型在Word2Vec音节向量+BERT音节向量+注意力机制加权的单词向量特征组合下最优F1值达到90.84%,分类效果优于对比模型,可以较有效地分类主客观句子,具有一定的应用价值。

Abstract

Focused on Tibetan news texts, this paper proposes a hybrid representation-based subjective and objective sentence classification model (HRTNSC). The input layer is enriched by fusing syllable-level features and word-level features. The BiLSTM+CNN network is applied to the subjective and objective classification of sentences. The experimental results show that the HRTNSC model achieves an optimal F1 value of 90.84%, which is better than the benchmark model.

关键词

主客观分类 / 混合表示 / 音节级特征 / 单词级特征

Key words

subjective and objective classification / hybrid representation / syllable-level features / word-level features

引用本文

导出引用
孔春伟,吕学强,张乐. HRTNSC: 基于混合表示的藏文新闻主客观句子分类模型. 中文信息学报. 2022, 36(12): 94-103,114
KONG Chunwei, LYU Xueqiang, ZHANG Le. HRTNSC: Hybrid Representation-based Subjective and Objective Sentence Classification for Tibetan News. Journal of Chinese Information Processing. 2022, 36(12): 94-103,114

参考文献

[1] Wiebe J,Wilson T,Bruce R,et al. Learning subjective language[J]. Computational Linguistics,2004,30(3): 277-308.
[2] 闫晓东,黄涛. 基于情感词典的藏语文本句子情感分类[J]. 中文信息学报,2018,32(2): 75-80.
[3] Tsai Y,Bai S,Liang P P,et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of 57th Annual Meeting of the Association for Computational Linguistics,2019: 6558-6569.
[4] 张晓梅,李茹,王斌,等. 基于融合特征的微博主客观分类方法[J]. 中文信息学报,2014,28(4): 50-57.
[5] He R D,Lee W S,Ng H T,et al. Effective attention modeling for aspect-level sentiment classification[C]//Proceedings of the 27th International Conference on Computational Linguistics,2018: 1121-1131.
[6] 李卫疆,漆芳,余正涛. 基于多通道特征和自注意力的情感分类方法[J]. 软件学报,2021,32(9): 2783-2800.
[7] Peng M,Ma R,Zhang Q,et al. Simplify the usage of lexicon in Chinese NER[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics,2020: 5951-5960.
[8] 何炎祥,孙松涛,牛菲菲,等. 用于微博情感分析的一种情感语义增强的学习模型[J]. 计算机学报,2017,40(4): 773-790.
[9] 刘培玉,荀静,费绍栋,等. 基于隐马尔科夫模型的主观句识别[J]. 中文信息学报,2016,30(4): 206-212.
[10] Huddar M G,Sannakki S S,Rajpurohit V S. Multi-evel feature optimization and multimodal contextual fusion for sentiment analysis and emotion classification[J]. Computational Intelligence,2020,36(5): 135-146.
[11] Huang B,Carley K M. Syntax-aware aspect level sentiment classification with graph attention networks[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing,2019: 5472-5480.
[12] Majumder N,Poria S,Gelbukh A F,et al. IARM: inter aspect relation modeling with memory networks in aspect-based sentiment analysis[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2018: 3402-3411.
[13] 张乐. 微博突发事件检测与情感分析研究[D]. 北京: 北京信息科技大学硕士学位论文,2014.
[14] Song Y,Wang J,Jiang T,et al. Attentional encoder network for targeted sentiment classification[J/OL]. arXiv preprint arXiv: 1902.09314,2019.
[15] 王剑,唐珊,黄于欣,等. 基于句子关联图的汉越双语多文档新闻观点句识别[J]. 计算机应用,2020,40(10): 2845-2849.
[16] Bahdanau D ,Cho K,Bengio Y. Neural machine translation by jointly learning to align and translate[J/OL]. arXiv preprint arXiv: 1409.0473,2014.
[17] 黄彪. 案件相关的新闻观点句抽取及情感分析方法研究[D]. 昆明: 昆明理工大学硕士学位论文,2020.
[18] 林思琦,余正涛,郭军军,等. 融入多特征的汉越新闻观点句抽取方法[J]. 中文信息学报,2019,33(11): 101-106.
[19] Mikolov T,Chen K,Corrado G,et al. Efficient estimation of word representations in vector space[J/OL]. arXiv preprint arXiv: 1301.3781,2013.
[20] Pennington J,Socher R,Manning C D. GloVe: Global vectors for word representation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2014: 1532-1543.
[21] 柔特,才让加. 基于循环卷积神经网络的藏文句类识别[J]. 中文信息学报,2019,33(12): 76-82.
[22] Ban M,Cai Z,Cai R,et al. Titetan interrogative sentence recognition and classification based on phrase features[C]//Proceedings of the 2nd International Conference on Computer Science,Communication and Network Security,2020: 502-512.
[23] 孙萌,华却才让,才智杰,等. 基于判别式分类和重排序技术的藏文分词[J]. 中文信息学报,2014,28(2): 61-65.
[24] 色差甲,慈祯嘉措,才让加,等. 基于神经网络的藏文正字检错法[J]. 中文信息学报,2020,34(12): 48-53.
[25] Devlin J,Chang M W,Lee K,et al. BERT: Pretraining of deep bidirectional transformers for language understanding[C]//Proceeding of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2019: 4171-4183.

基金

国家自然科学基金(61671043);北京市自然科学基金(4212020);青海省藏文信息处理与机器翻译重点实验室/藏文信息处理教育部重点实验室开放课题基金(2019Z002);青海省重点研发计划(2022-ZJ-T02)
PDF(2056 KB)

Accesses

Citation

Detail

段落导航
相关文章

/