文本情感分类是自然语言处理中的经典任务,在判断文本的情感极性、舆情监控、市场呼声、商品评论等领域有重要的应用价值。该文提出了一种基于预训练模型进行细粒度文本情感分类的新方法。基于文章级别的情感分类任务,需要模型同时具有较高的语义概括能力和抗噪能力。为此,该文利用BiLSTM网络对预训练模型中每层Transformer的权重进行调整,通过将各层表达的不同粒度的语义表征进行动态融合的方式,提高模型的语义空间表达能力。为了增强模型的泛化性能,该文在下游任务结合BiLSTM和BiGRU等结构对得到的语义向量进行特征过滤。利用该模型,作者在CCF 2020年举办的科技战疫·大数据公益挑战赛—疫情期间网民情绪识别赛道中位列第三,最终测试集的F1值为0.745 37,该模型的参数量比第一名模型少67%,但二者分数差距仅为0.000 1,说明该方法具备可行性与有效性。
Abstract
This paper proposes a new method for text sentiment classification based on the pre-training model. The BiLSTM network is applied to dynamically adjust the output weight of the Transformer of each layer of the pre-training model, and the layered text representation vectors are filtered using features such as BiLSTM and BiGRU. By using the model, we achieved third place in the Netizen Emotion Recognition Track during the epidemic of CCF 2020 Science and Technology for Epidemic·Big Data Charity Challenge. The F1 value of the final test set is 0.745 37, which is 0.000 1 less than the first-place model with 67% less parameters.
关键词
文本情感分类 /
预训练模型 /
动态权重 /
长短时记忆网络
{{custom_keyword}} /
Key words
text emotion classification /
pre-training model /
dynamic weight /
long short-term memory
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Pennington J,Socher R, Manning C D, et al. GloVe: Global vectors for word representation[C]//Proceedings of the Empirical Methods in Natural Language Processing, 2014: 1532-1543.
[2] Huang E H,Socher R, Manning C D, et al. Improving word representations via global context and multiple word prototypes[C]//Proceedings of Meeting of the Association for Computational Linguistics, 2012: 873-882.
[3] Chen X, Xu L, Liu Z, et al. Joint learning of character and word embeddings [C]//Proceedings of the 24th International Joint Conference on Artificial Intelligence, 2015: 677-683.
[4] Peters M E, Neumann M,Iyyer M, et al. Deep contextualized word representations[J]. North American Chapter of the Association for Computational Linguistics, 2018, (1): 2227-2237.
[5] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv: 2018,103(C): 15-18.
[6] Liu Y, Ott M, Goyal N, et al. RoBERTa: a robustly optimized BERT pretraining approach[J].arXiv preprint arXiv.2019, 30(5): 171-178.
[7] Learning lexical scales: Wordnet and sentiwordnet [EB/OL].http: //compprag. christopherpotts. net/wordnet.html.[2017-06-05].
[8] Snownlp: Python library for processing Chinese text [EB/OL].https: //pypi. python. org/pypi/snownlp.[2017-06-13].
[9] Iqbal S, Zulqurnain A, Wani Y, et al. The survey of sentimentand opinion mining for behavior analysis of social media [J]. arXiv preprint arXiv, 2015,16(9): 123-147.
[10] Grosse K, González M P,Chesevar C I, et al. Integrating argumentation and sentiment analysis for mining opinions from Twitter[J]. AI Communications, 2015, 28(3): 387-401.
[11] Kim Y. Convolutional neural networks for sentence classification[J]. arXiv preprint arXiv, 2014,26(1): 12-17.
[12] Irsoy O, Cardie C. Deep recursive neural networks for compositionality in language[C]//Proceedings of the Advances in Neural Information Processing Systems, 2014: 2096-2104.
[13] Chakraborty K, Bhattacharyya S, Bag R, et al. Comparative sentiment analysis on a set of movie reviews using deep learning approach[C]//Proceedings of the International Conference on Advanced Machine Learning Technologies and Applications. Springer, Cham, 2018: 311-318.
[14] Bahdanau D, Cho K, Bengio Y et al. Neural machine translation byjointly learning to align and translate[J]. arXiv preprintarXiv, 2014,14(09): 44-73.
[15]HowardJ, Ruder S. Universal language model finetuningfor text classification[J]. arXiv preprint arXiv,2018,13(4): 31-38.
[16] Radford A, Marasim H. Improving language understanding by generative pre-training[EB/OL]https://blog.csdn.net/qq_27717921/articleldetails/99670843.[2019-08-22].
[17] Peters M E. Neumann M, Ivver M, et al. Deep contextualized word representations[J]. Psychiatry Research,2018,104(C): 64-75.
[18] Ganesh J, Benot S, Djamé S. What does BERT learn about the structure of language?.[J].ACL2019- 57th Annual Meeting ofthe Association for Computational Linguistics, 2019, 19(4): 41-48.
[19] Wang Z, Wu Y, Lei P, etal. Named entity recognition method of brazilian legal text based on pretraining model [J]. Journal of Physics: Conference Series. IOP Publishing, 2020,1550(3): 32-149.
[20] Zheng S, Meng Y. A new method of improving BERT for text classification[J]. International Conference on Intelligent Science and Big Data Engineering. Springer, Cham, 2019,9(1): 18-39.
[21] Kingma D,Ba J. Adam: a method for stochastic optimization[J]. Computerence, 2014,12(3): 55-67.
[22] Srivastava, N. Dropout: a simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research,2014, 15(1): 1929-1958.
[23] Laurens V D M, Hinton G. Visualizing data using TSNE[J]. Journal of Machine Learning Research, 2008, 9(2605): 2579-2605.
[24] Athiwaratkun B, Finzi M, Izmailov P, et al. There are many consistent explanations of unlabeled data: why you should average[J].arXiv preprint arXiv,2018,6(5): 45-94.
[25] Xie Q, Dai Z, Hovy E,et al. Unsupervised data augmentation for consistency training[J]. Advances in Neural Information Processing Systems, 2020,33: 6256-6268.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
深圳大学2020年研究生教育改革项目(860-000001050503)
{{custom_fund}}