QV-Electra: 引入Query-Value注意力机制的预训练文本分类模型

邵党国,孔宪媛,相艳,安青,黄琨,郭军军

PDF(1388 KB)
PDF(1388 KB)
中文信息学报 ›› 2023, Vol. 37 ›› Issue (9) : 92-97.
信息抽取与文本挖掘

QV-Electra: 引入Query-Value注意力机制的预训练文本分类模型

  • 邵党国1,2,孔宪媛1,相艳1,2,安青1,黄琨1,郭军军1,2
作者信息 +

QV-Electra: A Pre-trained Text Classification Model with Query-Value Attention Mechanism

  • SHAO Dangguo1,2, KONG Xianyuan1, XIANG Yan1,2, AN Qing1, HUANG Kun1, GUO Junjun1,2
Author information +
History +

摘要

预训练语言模型的作用是在大规模无监督语料上基于特定预训练任务获取语义表征能力,故在下游任务中仅需少量语料微调模型且效果较传统机器学习模型(如CNN、RNN、LSTM等)更优。常见的预训练语言模型如BERT、Electra、GPT等均是基于传统Attention机制搭建。研究表明,引入Query-Value计算的QV-Attention机制效果较Attention机制有所提升。该文模型QV-Electra将QV-Attention引入预训练模型Electra,该模型在保留Electra预训练模型参数的同时仅通过添加0.1%参数获得性能提升。实验结果表明,QV-Electra模型在同等时间的情况下,相较于传统模型以及同等参数规模预训练模型能取得更好的分类效果。

Abstract

The pretrained language model outperforms the preceding deep learning fameworks such as CNN, RNN, LSTM, etc. Current pre-training language models, such as BERT, Electra, GPT, etc., are all built based on the ATTENTION mechanism. Studies have shown that the QV-attention mechanism with Query-value calculation is more effective. This paper establishes the QV-Electra by introducing QV-Attention into the pre-training model Electra. It can improve the performance by adding only 0.1% parameter while retaining the parameters of the pre-training model Electra. The experimental results show that the QV-Electra model achieves better classification performance than the CNN, RNN, LSTM and the pre-training models (BERT, Electra) of same parameter scale.

关键词

Electra预训练模型 / Attention机制 / QV-Attention机制 / 文本分类

Key words

Electra pre-training model / Attention mechanism / QV-Attention mechanism / text classification

引用本文

导出引用
邵党国,孔宪媛,相艳,安青,黄琨,郭军军. QV-Electra: 引入Query-Value注意力机制的预训练文本分类模型. 中文信息学报. 2023, 37(9): 92-97
SHAO Dangguo, KONG Xianyuan, XIANG Yan, AN Qing, HUANG Kun, GUO Junjun. QV-Electra: A Pre-trained Text Classification Model with Query-Value Attention Mechanism. Journal of Chinese Information Processing. 2023, 37(9): 92-97

参考文献

[1] 蒋盛益,黄卫坚,蔡茂丽,等.面向微博的社会情绪词典构建及情绪分析方法研究[J].中文信息学报,2015,29(06):166-171.
[2] 钱凯雨,郭立鹏.融入习语信息的网络评论情感分析研究[J].小型微型计算机系统,2017,38(06):1273-1277
[3] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the Advances in Neural Information Processing systems. 2017: 5998-6008.
[4] BENGIO Y, DUCHARME R, VINCENT P, et al. A neural probabilistic language model[J]. The Journal of Machine Learning Research, 2003, 3: 1137-1155.
[5] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. arXivpre-printarXiv: 1307.3781, 2013.
[6] PETERS M E,NEUMANN M,IYYER M,et al.Deep contextualized word representations[J].Association for Computational Linguistics,2018.1:2227-2237.
[7] RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pretraining[J].arXiv preprint arXiv: 1301.3781,2018.
[8] DEVLIN J,CHANG M W,LEE K,et al.BERT: Pre-training of deep bidirectional transformers for language understanding [C]//Proceedings of the 29th Conference of the North American Chapter of the Association of Computational Linguistics,2018: 4171-4186.
[9] CLARK K, LUONG M T, LE Q V, et al. Electra: Pre-training text encoders as discriminators rather than generators[J]. arXiv preprint arXiv:2003.10555, 2020.
[10] WU C H,WU F ZH, QI T,et al.Improving attention mechanism with query-value interaction[J].arXiv e-prints:arXiv:2010.03766,2020.
[11] CUI Y, CHE W, LIU T, et al. Revisiting pre-trained models for Chinese natural language processing[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2020.
[12] GONG Y, ZHANG Q. Hashtag recommendation using attention-based convolutional neural network[C]//Proceedings of the IJCAI, 2016: 2782-2788.
[13] KIM S M, HOVY E. Extracting opinions, opinion holders, and topics expressed in online news media text[C]//Proceedings of the Workshop on Sentiment and Subjectivity in Text, 2006: 1-8.
[14] MIKOLOV T, KARAFIáT M, BURGET L, et al. Recurrent neural network based language model[C]//Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010.
[15] SHARFUDDIN A A, TIHAMI M N, ISLAM M S. A deep recurrent neural network with BiLSTM model for sentiment classification[C]//Proceedings of the International Conference on Bangla Speech and Language Processing. IEEE, 2018: 1-4.
[16] CUI Y, CHE W, LIU T, et al. Pre-training with whole word masking for Chinese bert[J]. arXiv preprint arXiv:1906.08101, 2019.

基金

国家自然科学基金(62266025);云南省基础研究专项面上项目(202001AT070047)
PDF(1388 KB)

732

Accesses

0

Citation

Detail

段落导航
相关文章

/