融合音节和词条特征的藏文文本情感分类研究

孟祥和,于洪志

PDF(1258 KB)
PDF(1258 KB)
中文信息学报 ›› 2023, Vol. 37 ›› Issue (2) : 80-86.
民族、跨境及周边语言信息处理

融合音节和词条特征的藏文文本情感分类研究

  • 孟祥和,于洪志
作者信息 +

Tibetan Text Sentiment Classification Combining Syllables and Words

  • MENG Xianghe, YU Hongzhi
Author information +
History +

摘要

将深度神经网络模型应用于藏文文本情感分类中,虽然取得不错的分类效果,但仍然存在因藏文评论文本长度较短引起的特征稀疏的问题,使得深度学习模型不能够提取到更为全面的藏文文本语义特征。该文提出一种以藏文音节和藏文词条同时作为文本基本表示对象,采用CNN、BiLSTM和Multi-Headed Self-Attention机制等深度学习模型完成对藏文评论文本情感分类的研究方法。实验首先对音节和词条进行向量化表示,然后分别采用多核卷积神经网络、BiLSTM和Multi-Headed Self-Attention机制获取藏文文本中多维度的内部特征,最后通过特征拼接,再经激活函数为Softmax的全连接神经网络完成文本情感分类。研究结果表明,在该文的实验测试语料集上,融合音节和词条特征模型的分类准确率要优于基于音节的模型和基于词条的模型。

Abstract

Deep neural network model has achieved good results in the sentiment classification of Tibetan texts. To further deal with the short Tibetan comment texts with sparse features, this paper proposes a method using both syllables and word as the unit of Tibetan text representation, which is fed into CNN, BiLSTM and Multi-Head Self-Attention mechanism to complete sentiment classification of Tibetan comment texts. Specifically,On the basis of the vectorization of syllables and words, we use multi-kernels convolutional neural networks, BiLSTM with Multi-Headed Self-Attention mechanism to obtain multi-dimensional internal features of the context in Tibetan text. Finally we concatenate the features and complete the text sentiment classification through the fully connected layer with the Softmax’s function. The experimental results show that, on the experimental test corpus of this paper, the accuracy of the proposed method is better than the syllables-based model and the words-based model.

关键词

藏文文本 / 情感分类 / 藏文音节 / 深度神经网络

Key words

Tibetan text / sentiment classification / Tibetan syllables / deep neural network

引用本文

导出引用
孟祥和,于洪志. 融合音节和词条特征的藏文文本情感分类研究. 中文信息学报. 2023, 37(2): 80-86
MENG Xianghe, YU Hongzhi. Tibetan Text Sentiment Classification Combining Syllables and Words. Journal of Chinese Information Processing. 2023, 37(2): 80-86

参考文献

[1] PANG B, LEE L, et al. Thumbs up?: Sentiment classification using machine learning techniques[C]//Proceedings of the ACL Conference on Empirical Methods in Natural Language Processing,2002: 79-86.
[2] 吴小华,陈莉,魏甜甜,等.基于Self-Attention和Bi-LSTM的中文短文本情感分析[J].中文信息学报, 2019,6(6): 100-107.
[3] 张庆庆,贺兴时,王慧敏,等.基于深度信念网络的文本情感分类研究[J].数据分析与知识发现, 2019,3(4): 71-79.
[4] 陈钊,徐睿峰,桂林,等.结合卷积神经网络和词语情感序列特征的中文情感分析[J].中文信息学报,2015,29(6): 172-178.
[5] 闫晓东,黄涛.基于情感词典的藏语文本句子情感分类[J].中文信息学报,2018,32(2): 75-80.
[6] 江涛,袁斌,于洪志,等.基于多特征的藏文微博情感倾向性分析[J].中文信息学报,2017,3(3): 163-169.
[7] 孙本旺.汉藏双语情感词典构建及情感计算研究[D].青海: 青海大学硕士学位论文,2019.
[8] MNIH V, HEESS N, GRAVES A. Recurrent models of visual attention[C]//Procedings of the 27th International Conference on Neural Information Processing Systems,2014: 2204-2212.
[9] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 5998-6008.
[10] 江伟,金忠.基于短语注意机制的文本分类[J]. 中文信息学报,2018,32(2): 102-109.
[11] YAO X. Attention-based BiLSTM neural networks for sentiment classification of short texts [C]//Proceedings of the 5th International Conference on Information Science and Cloud Computing,2017: 110-117.
[12] 李亚超,加羊吉,宗成庆,等.基于条件随机场的藏语自动分词方法研究与实现[J].中文信息学报,2013,27(4): 52-58.
[13] 江荻,董颖红.藏文信息处理属性统计研究[J].中文信息学报,1995,9(2): 37-44.
[14] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv: 1301.3781, 2013.
[15] YOON K. Convolutional neural networks for sentence classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2014: 1746-1751.

基金

西北民族大学2021年度中央高校基本科研业务费项目(31920210087);西北民族大学2020年度中央高校基本科研业务费项目(31920200116)
PDF(1258 KB)

992

Accesses

0

Citation

Detail

段落导航
相关文章

/