DistillBIGRU:基于知识蒸馏的文本分类模型

黄友文,魏国庆,胡燕芳

PDF(1526 KB)
PDF(1526 KB)
中文信息学报 ›› 2022, Vol. 36 ›› Issue (4) : 81-89.
信息抽取与文本挖掘

DistillBIGRU:基于知识蒸馏的文本分类模型

  • 黄友文,魏国庆,胡燕芳
作者信息 +

DistillBIGRU: Text Classification Model Based on Knowledge Distillation

  • HUANG Youwen, WEI Guoqing, HU Yanfang
Author information +
History +

摘要

文本分类模型可分为预训练语言模型和非预训练语言模型,预训练语言模型的分类效果较好,但模型包含的参数量庞大,对硬件算力的要求较高,限制了其在许多下游任务中的使用。非预训练语言模型结构相对简单,推算速度快,对部署环境的要求低,但效果较差。针对以上问题,该文提出了基于知识蒸馏的文本分类模型DistillBIGRU,构建MPNetGCN模型作为教师模型,选择双向门控循环单元网络作为学生模型,通过知识蒸馏得到最终模型DistillBIGRU。在多个数据集上教师模型MPNetGCN的平均分类准确率相比BERTGCN提高了1.3%,DistillBIGRU模型在参数量约为BERT-Base模型1/9的条件下,达到了与BERT-Base模型相当的文本分类效果。

Abstract

To balance the classification accuracy and computation cost of text classification model, this paper proposes a text classification model DistillBIGRU based on knowledge distillation. We construct the MPNetGCN model as the teacher model, select the bidectional gated recurent unit network as the student model, and obtain the final model DistillBIGRU through knowledge distillation. On multiple data sets, the average classification accuracy of the teacher model MPNetGCN is 1.3% higher than that of BERTGCN. And the DistillBIGRU achieves comparable classification effect to the BERT-Base mode with roughly 1/9 parameters of the latter.

关键词

文本分类 / 知识蒸馏 / 双向门控循环单元

Key words

text classification / knowledge distillation / bidectional gated recurent unit

引用本文

导出引用
黄友文,魏国庆,胡燕芳. DistillBIGRU:基于知识蒸馏的文本分类模型. 中文信息学报. 2022, 36(4): 81-89
HUANG Youwen, WEI Guoqing, HU Yanfang. DistillBIGRU: Text Classification Model Based on Knowledge Distillation. Journal of Chinese Information Processing. 2022, 36(4): 81-89

参考文献

[1] Hinton G,Vinyals O, Dean J. Distilling the knowledge in a neural network[J].Computer Science, 2015, 14(7): 38-39.
[2] Sanh V, Debut L, Chaumond J, et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter[J/OL]. arXiv preprint arXiv:1910.01108, 2019.
[3] Kenton J D M W C, Toutanova L K.BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the NAACL-HLT, 2019: 4171-4186.
[4] Jiao X, Yin Y, Shang L, et al.TinyBERT: distilling BERT for natural language understanding[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing: Findings,2020: 4163-4174.
[5] Liu W, Zhou P, Wang Z, et al.FastBERT: a self-distilling BERT with adaptive inference time[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 6035-6044.
[6] 史瑞芳.贝叶斯文本分类器的研究与改进[J].计算机工程与应用,2009,45(12):147-148.
[7] Miao S H I, Feng L I U. A hybrid algorithm for text classi fication based PCA and kNN[J]. Computer Knowledge and Technology, 2015, 11(10): 169-171.
[8] 周庆平, 谭长庚, 王宏君, 等. 基于聚类改进的 KNN 文本分类算法[J]. 计算机应用研究, 2016, 33(11): 3374-3377.
[9] Guo L, Zhang D, Wang L, et al. CRAN: a hybrid CNN-RNN attention-based model for text classification[M].Cham:Springer,2018: 571-585.
[10] Wu H, Li D, Cheng M. Chinese text classification based on character-level CNN and SVM[J]. International Journal of Intelligent Information and Database Systems, 2019, 12(3): 212-228.
[11] 王海涛, 宋文, 王辉. 一种基于 LSTM 和 CNN 混合模型的文本分类方法[J]. 小型微型计算机系统, 2020, 41(6): 1163-1168.
[12] Liu G, Guo J. Bidirectional LSTM with attention mechanism and convolutional layer for text classification[J]. Neurocomputing, 2019, 337: 325-338.
[13] 滕金保, 孔韦韦, 田乔鑫, 等. 基于 LSTM-attention 与 CNN 混合模型的文本分类方法[J]. 计算机工程与应用, 2021, 57(14): 126-133.
[14] 张昱,刘开峰,张全新,等.基于组合-卷积神经网络的中文新闻文本分类[J].电子学报,2021,49(06):1059-1067.
[15] Yang Z, Dai Z, Yang Y, et al.XLNet: generalized autoregressive pretraining for language understanding[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019: 5753-5763.
[16] Song K, Tan X, Qin T, et al.MPNet: masked and permuted pre-training for language understanding[J/OL]. arXiv preprint arXiv:2004.09297, 2020.
[17] Ma X, Shen Y, Fang G, et al. Adversarial self-supervised data-free distillation for text classification[J/OL]. arXiv preprint arXiv:2010.04883, 2020.
[18] Li Y, Li W. Data distillation for text classification[J/OL]. arXiv preprint arXiv:2104.08448, 2021.
[19] Kipf, Thomas N, Welling M.Semi-supervised classification with graph convolutional networks[J/OL].arXiv preprint arXiv:1609.02907, 2016.
[20] Yao L, Mao C, Luo Y. Graph convolutional networks for text classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(01): 7370-7377.
[21] Tang H, Mi Y,Xue F, et al. An integration model based on graph convolutional network for text classification[J]. IEEE Access, 2020, 8(43): 148865-148876.
[22] Lin Y, Meng Y, Sun X, et al.BERTGCN: transductive text classification by combining GCN and BERT[J/OL]. arXiv preprint arXiv:2105.05727, 2021.
[23] Xu Z.RoBERTa-wwm-ext fine-tuning for Chinese text classification[J/OL]. arXiv preprint arXiv:2103.00492, 2021.
[24] Wasserblat M, Pereg O, Izsak P. Exploring the boundaries of low-resource BERT distillation[C]//Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, 2020: 35-40.

基金

江西省教育厅科学技术研究项目(GJJ180443)
PDF(1526 KB)

Accesses

Citation

Detail

段落导航
相关文章

/