文本分类是自然语言处理的基本任务之一。该文在原型网络基础上,提出了按时序移动平均方式集成历史原型向量的均值原型网络,并将均值原型网络与循环神经网络相结合,提出了一种新的文本分类模型。该模型利用单层循环神经网络学习文本的向量表示,通过均值原型网络学习文本类别的向量表示,并利用文本向量与原型向量的距离训练模型并预测文本类别。与己有的神经网络文本分类方法相比,模型在训练和预测过程中有效利用了样本间的特征相似关系,并具有网络深度浅、参数少的特点。该方法在多个公开的文本分类数据集上取得了最好的分类准确率。
Abstract
Text classification is a fundamental issue of natural language processing. Based on the prototypical networks, this paper proposes a mean prototype network by an integrating different time steps prototype vectors through moving average, and then combining the mean prototype network with a simple RNN to propose a novel text classification model. The model uses a single-layer RNN to learn the vector representation of text, and learns categories vector representation by the mean prototype networks. The model applies the distance between the text vector and the prototype vector to train the model and predict the text category. Compared with the existing neural text classification method, the model is featured by the shallow depth and fewer parameters, and the introduction of similarity between samples in training and prediction process. The proposed method achieves state-of-the-art results on five benchmark datasets for text classification.
关键词
文本分类 /
均值原型网络 /
自集成学习
{{custom_keyword}} /
Key words
text classification /
mean prototype network /
self-ensemble
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Lewis D, Yang Y, Rose T G, et al. RCV1: A new benchmark collection for text categorization tesearch[J]. Journal of Machine Learning Research, 2004: 361-397.
[2] Pang B, Lee L. Opinion mining and sentiment analysis[J]. Foundations and Trends in Information Retrieval, 2008, 2(1): 1-135.
[3] Sahami M, Dumais S, Heckerman D, et al. A bayesian approach to filtering junk email[C]//Proceedings of the AAAI'98 Workshop on Learning for Text Categorization, 1998, 62: 98-105.
[4] McCallum A, Nigam K. A comparison of event models for naive bayes text classification[C]//Proceedings of the AAAI'98 Workshop on Learning for Text Categorization, 1998, 752(1): 41-48.
[5] Joachims T. Text categorization with support vector machines: learning with many relevant features[C]//Proceedings of the European Conference on Machine Learning. Springer, Berlin, Heidelberg, 1998: 137-142.
[6] Kim Y. Convolutional neural networks for sentence classification[C]//Proceedings of the Empirical Methods in Natural Language Processing, 2014: 1746-1751.
[7] Johnson R, Zhang T. Effective use of word order for text categorization with convolutional neural networks[C]//Proceedings of the North American Chapter of the Association for Computational Linguistics, 2015: 103-112.
[8] Zhang X, Zhao J, LeCun Y. Character-level convolutional networks for text classification[C]//Proceedings of the Advances in Neural Information Processing Systems, 2015: 649-657.
[9] Conneau A, Schwenk H, Barrault L, et al. Very deep convolutional networks for text classification[C]//Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics, 2017: 1107-1116.
[10] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the Advances in Neural Information Processing Systems, 2013: 3111-3119.
[11] Socher R, Perelygin A, Wu J, et al. Recursive deep models for semantic compositionality over a sentiment treebank[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2013: 1631-1642.
[12] Dai A M, Le Q V. Semi-supervised sequence learning[C]//Proceedings of the Advances in Neural Information Processing Systems, 2015: 3079-3087.
[13] Xiao Y, Cho K. Efficient character-level document classification by combining convolution and recurrent layers[J/OL]. arXiv preprint arXiv: 1602.00367, 2016.
[14] Miyato T, Dai A M, Goodfellow I J, et al. Adversarial training methods for semi-supervised text classification[C]//Proceedings of the International Conference on Learning Representations, 2017.
[15] Miyato T, Maeda S, Ishii S, et al. Virtual adversarial training: a regularization method for supervised and semi-supervised learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1979-1993.
[16] Joulin A, Grave E, Bojanowski P, et al. Bag of tricks for efficient text classification[C]//Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics, 2017: 427-431.
[17] Qiao C, Huang B, Niu G, et al. A new method of region embedding for text classification[C]//Proceedings of the International Conference on Learning Representations, 2018.
[18] Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning[C]//Proceedings of the Advances in Neural Information Processing Systems, 2017: 4077-4087.
[19] Tarvainen A, Valpola H. Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results[C]//Proceedings of the Advances in Neural Information Processing Systems, 2017: 1195-1204.[20] Laine S, Aila T. Temporal ensembling for semi-supervised learning[C]//Proceedings of the International Conference on Learning Representations, 2017.
[21] Zhang X, Lecun Y. Text understanding from scratch[J/OL]. arXiv preprint arXiv: 1502.01710, 2015.
[22] Yogatama D, Dyer C, Ling W, et al. Generative and discriminative text classification with recurrent neural networks[J/OL]. arXiv preprint arXiv: 1703.01898, 2017.
[23] Paszke A, Gross S, Chintala S, et al. Automatic differentiation in PyTorch[C]//Proceedings of the NIPS Autodiff Workshop, 2017.
[24] Pennington J, Socher R, Manning C D, et al. Glove: Global vectors for word representation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014: 1532-1543.
[25] Kingma D P, Ba J. Adam: A method for stochastic optimization[C]//Proceedings of the International Conference on Learning Representations, 2015.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家重点研发计划(2018YFC0830105,2018YFC0830100);国家自然科学基金(61732005,61672271,61562052,61762056)
{{custom_fund}}