基于元学习的不平衡少样本情况下的文本分类研究

PDF(4564 KB)

中文信息学报 ›› 2022, Vol. 36 ›› Issue (1) : 104-116.

信息抽取与文本挖掘

基于元学习的不平衡少样本情况下的文本分类研究

熊伟^1,2,宫禹¹

作者信息 +

Text Classification Based on Meta Learning for Unbalanced Small Samples

XIONG Wei^1,2, GONG Yu¹

Author information +

History +

摘要

针对文本信息语义、语境迁移难问题,该文提出一种基于元学习与注意力机制模型的动态卷积神经网络改进方法。首先利用文本的底层分布特征进行跨类别分类,使文本信息具有更好的迁移性;其次使用注意力机制对传统的卷积网络进行改进,以提高网络的特征提取能力,并根据原始数据集信息进行编码,生成平衡变量,降低由于数据不平衡所带来的影响;最后使用双层优化的方法使模型自动优化其网络参数。在通用文本分类数据集THUCNews实验结果表明,该文所提出的方法,在1-shot、5-shot情况下,准确率分别提升2.27%、3.26%; 在IMDb数据集上,模型准确率分别提升3.28%、3.01%。

Abstract

To address the semantic and context transfer of text information, an improved method of dynamic convolution neural network based on meta learning and attention mechanism is proposed. Firstly, cross category classification is carried out by using the underlying distribution features of the text to make the text information ready for transfer. Secondly, the attention mechanism is used to improve the traditional convolution network to improve the feature extraction ability of the network, and the balanced variables are generated according to the information of the original data set to reduce the impact of the imbalance of data. Finally, the parameters of the model are optimized automatically by using the two-level optimization method. The experimental results on the general text classification THUCNews dataset show that the proposed method has improved the accuracy by 2.27% and 3.26% in the 1-shot and 5-shot experiments, respectively, and on the IMDb dataset, by 3.28% and 3.01%, respectively.

导出引用

熊伟,宫禹. 基于元学习的不平衡少样本情况下的文本分类研究. 中文信息学报. 2022, 36(1): 104-116

XIONG Wei, GONG Yu. Text Classification Based on Meta Learning for Unbalanced Small Samples. Journal of Chinese Information Processing. 2022, 36(1): 104-116

参考文献

[1] Zhang K, Gool L V, R Timofte. Deep unfolding network for image super-resolution[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 3217-3226.
[2] Chen J, Lei B, Song Q, et al. A hierarchical graph network for 3D object detection on point clouds[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020: 392-401.
[3] 王启发,周敏,王中卿,等.基于用户与产品信息和图卷积网络的情感分类研究[J].中文信息学报, 2021, 35(03):134-142.
[4] 胡玉兰,赵青杉,陈莉,等.面向中文新闻文本分类的融合网络模型[J].中文信息学报, 2021, 35 (03):107-114.
[5] 赵亚欧,张家重,李贻斌,等.基于ELMO和Transformer混合模型的情感分析[J].中文信息学报, 2021, 35(03):115-124.
[6] 李凡长,刘洋,吴鹏翔,等.元学习研究综述[J].计算机学报, 2021, 44(02):422-446.
[7] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//Proceedings of the International Conference on Machine Learning, 2017: 1126-1135.
[8] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//Proceedings of the 34th International Conference on Machine Learning, 2017: 1856-1868.
[9] Sung F, Yang Y, Zhang L. Learning to compare: Relation network for few-shot learning [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 1199-1208.
[10] Xie E, Sun P, Song X, et al. Polar Mask: Single shot instance segmentation with polar representation[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020: 12190-12199.
[11] Gravano A. Turn-taking and affirmative cue words in task-oriented dialogue[M]. Columbia University, 2009.
[12] Azpiazu I M, Pera M S. Hierarchical mapping for crosslingual word embedding alignment[J]. Transactions of the Association for Computational Linguistics, 2020, 8:361-376.
[13] Zhang X,Duh K. Reproducible and efficient benchmarks for hyperparameter optimization of neural machine translation systems[J]. Transactions of the Association for Computational Linguistics, 2020, 8(1):393-408.
[14] Petrenz P, Webber B. Stable classification of text genres[J]. Computational Linguistics, 2011, 37(2): 385-393.
[15] Zhang J, Chen C, Liu P, et al. Target-guided structured attention network for target-dependent sentiment analysis[J]. Transactions of the Association for Computational Linguistics, 2020, 8(1):172-182.
[16] KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014: 1746-1751.
[17] Chen Y, Dai X, Liu M, et al. Dynamic convolution: attention over convolution kernels[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11030-11039.
[18] Nichol A, Schulman J. Reptile: a scalable meta-learning algorithm[J]. arXiv preprint arXiv:1803.02999, 2018, 2(3): 4.
[19] Hospedales T M, Antoniou A, Micaelli P, et al. Meta-Learning in neural networks: A survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 102: 1-20.
[20] Pang Y, He Q, Jiang G, et al. Spatio-temporal fusion neural network for multi-class fault diagnosis of wind turbines based on SCADA data[J]. Renewable Energy, 2020, 161: 510-524.
[21] Zhang X, Zhao J, Lecun Y. Character-level convolutional networks for text classification[J]. Advances in Neural Information Processing Systems, 2015, 28: 649-657.
[22] Peng Z, Wei S, Tian J, et al. Attention-based bidirectional long short-term memory networks for relation Classification[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 207-212.
[23] Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6): 1137-1149.
[24] Mikolov T, Corrado G, Kai C, et al. Efficient estimation of word representations in vector space[C]// Proceedings of the International Conference on Learning Representations, 2013: 1355-1370.
[25] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 770-778.
[26] Zagoruyko S, Komodakis N. Wide residual networks[C]//Proceedings of the British Machine Vision Conference, 2016: 1-15
[27] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.
[28] Kruspe A. A simple method for domain adaptation of sentence embeddings[J]. arXiv preprint arXiv:2008.11228, 2020.
[29] Cai Q, Pan Y, Yao T, et al.Memory matching networks for one-shot image recognition [C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 4080-4088.
[30] Pan C, Huang J, Gong J, et al. Few-shot transfer learning for text classification with lightweight wordembedding based models[J]. IEEE Access, 2019, 7: 53296-53304.
[31] Shin H C, Roth H R, Gao M, et al. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning[J]. IEEE Transactions on Medical Imaging, 2016, 35(5): 1285-1298.
[32] Antoniou A, Storkey A, Edwards H. How to train your MAML[C]//Proceedings of the International Conference on Learning Representations, 2019: 1-11.
[33] Liu Y, Lee J, Park M, et al. Learning to propagate labels: Transductive propagation network for few-shot learning[C]//Proceedings of the International Conference on Learning Representations, 2019: 1-15.
[34] Rodríguez P, Laradji I, Drouin A, et al. Embedding propagation: Smoother manifold for few-shot classification[C]//Proceedings of the European Conference on Computer Vision. Springer, Cham, 2020: 121-138.

PDF(4564 KB)

2030

Accesses

Citation

Detail

段落导航

摘要
Abstract
关键词
Key words
引用本文
参考文献

Received	Published
2021-07-26	2022-02-28
Issue Date
2022-02-28

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注