科研项目文本的分类往往需要耗费巨大的人力、物力,因此采用智能方法实现对项目文本分类意义重大。文本分类方法的核心在于文本语义特征的提取,高效的特征提取方法有助于准确构建文本到类别之间的映射。已有的文本分类方法往往基于整个文本或者一部分文本作为分类依据,可能出现信息的冗余或缺失。该文针对结构化的项目文本,在BERT等预训练网络的基础上,创新性地提出基于单交叉注意力机制的两视图项目文本分类学习方法(Two-View Cross Attention, TVCA)和基于双交叉注意力机制的多视图项目文本分类学习方法(Multi-View Cross Attention, MVCA)。MVCA方法基于项目文本的一个主要视图(项目摘要)和两个辅助视图(研究内容、目的和意义),通过两个交叉注意力机制提取包含更丰富语义信息的特征向量,进一步改善分类模型的性能。我们将TVCA和MVCA方法应用于英文论文数据Web of Science Meta-data和南方电网科技项目文本的分类任务中,实验结果验证了TVCA和MVCA方法无论从分类效果还是收敛速度上,都明显优于已有的比较方法。
Abstract
Aiming at structured scientific research project text, this paper proposes a novel two-view cross attention (TVCA) and multi-view cross attention text classification method (Multi-View Cross Attention, MVCA) based on pre-trained networks such as BERT. The MVCA method is targeted at one main important chapter (project abstract) and two chapters of the project text (research content, research purpose and meaning), extracting feature vectors containing richer semantic information through a cross-attention mechanism to further improve the performance of the classification model. Applied to the classification tasks of scientific publications and research project texts of China Southern Power Grid, the MVCA method is significantly better than the existing methods in terms of classification effect and convergence speed.
关键词
多视图分类 /
交叉注意力机制 /
文本分类
{{custom_keyword}} /
Key words
multi-view classification /
cross-attention mechanism /
text classification
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Sebastiani F. Text categorization[M]. Encyclopedia of Database Technologies and Applications. IGI Global, 2005: 683-687.
[2] 叶雪梅,毛雪岷,夏锦春,等.文本分类TF-IDF算法的改进研究[J].计算机工程与应用, 2019, 055(002): 104-109.
[3] Chen K, Huang Y, Gao W, et al. An improved Naive Bayesian text classification algorithm based on weighted features and its complementary set[J]. Journal of Harbin University of Science and Technology, 2018, 23: 69-74.
[4] Shan C. Research of support vector machine in text classification[M]. Future Computer, Communication, control and Automation. Springer, Berlin, Heidelberg, 2012: 567-573.
[5] Rong X. Word2Vec parameter learning explained[J].arXiv preprint arXiv: 1411.2738, 2014.
[6] Graves A. Long short-term memory[M]. Supervised Sequence Labelling with Recurrent Neural Networks. Springer, Berlin, Heidelberg, 2012: 37-45.
[7] Vieira J P A, Moura R S. An analysis of convolutional neural networks for sentence classification[C]//Proceedings of the XLIII Latin American Computer Conference. IEEE, 2017: 1-5.
[8] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25: 1097-1105.
[9] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[J].arXiv preprint arXiv: 1810.04805, 2018.
[10] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the Advances in Neural Information Processing Systems, 2017: 5998-6008.
[11] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision And Pattern Recognition, 2016: 770-778.
[12] Ba J L,Kiros J R, Hinton G E. Layer normalization[J]. arXiv preprint arXiv: 1607.06450, 2016.
[13] Cui Y, Che W, Liu T, et al. Pre-training with whole word masking for Chinese BERT[J]. arXiv preprint arXiv: 1906. 08101, 2019.
[14] Liu Y, Ott M, Goyal N, et al. ROBERTA: A robustly optimized BERT pretraining approach[J]. arXiv preprint arXiv: 1907.11692, 2019.
[15] Bickel S, Scheffer T. Multi-view clustering[C]//Proceedings of the ICDM, 2004: 19-26.
[16] 曾静.基于决策级融合策略的多视图多标签学习算法研究[D]. 湘潭: 湘潭大学硕士学位论文,2020.
[17] Blum A, Mitchell T. Combining labeled and unlabeled data with co-training[C]//Proceedings of the Eleventh Annual Conference on Computational Learning Theory, 1998: 92-100.
[18] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[J]. arXiv preprint arXiv: 1409.0473, 2014.
[19] Hou R, Chang H, Ma B, et al. Cross attention network for few-shot classification[J].arXiv preprint arXiv: 1910. 07677, 2019.
[20] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv: 2010.11929, 2020.
[21] Xu K, Ba J, Kiros R, et al. Show, attend and tell: neural image caption generation with visual attention[C]//Proceedings of the International Conference on Machine Learning. PMLR, 2015: 2048-2057.
[22] Kowsari K, Brown D E, Heidarysafa M, et al. Hdltex: hierarchical deep learning for text classification[C]//Proceedings of the 16th IEEE International Conference on Machine Learning and Applications. IEEE, 2017: 364-371.
[23] 刘文臻.中文文本多标签分类算法研究[D]. 成都: 电子科技大学硕士学位论文,2020.
[24] Yang Z, Yang D, Dyer C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 1480-1489.
[25] Madry A, Makelov A, Schmidt L, et al. Towards deep learning models resistant to adversarial attacks[J]. arXiv preprint arXiv: 1706.06083, 2017.
[26] Jing R. A self-attention based LSTM network for text classification[J]. Journal of Physics(conference series). IOP Publishing, 2019, 1207(1): 012008.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}