融合语义和句法依存分析的图卷积新闻文本分类

孙红,陆欣荣,徐广辉,黄雪阳,任丽博

PDF(3288 KB)
PDF(3288 KB)
中文信息学报 ›› 2023, Vol. 37 ›› Issue (7) : 91-101.
信息抽取与文本挖掘

融合语义和句法依存分析的图卷积新闻文本分类

  • 孙红1,陆欣荣1,徐广辉1,2,黄雪阳1,任丽博1
作者信息 +

Graph Convolution Approach to News Text Classification Combing Semantic Relation and Syntactic Dependency

  • SUN Hong1, LU Xinrong1, XU Guanghui1,2, HUANG Xueyang1, REN Libo1
Author information +
History +

摘要

图卷积神经网络GCN已经广泛应用于文本分类任务中,但GCN在文本分类时仅仅根据词语的共现关系来构建文本图,忽略了文本语言本身的规律关系,如语义关系与句法关系,并且GCN不善于提取文本上下文特征和序列特征。针对上述问题,该文提出了一种文本分类模型SEB-GCN,其在文本词共现图的基础上加入了句法文本图与语义文本图, 再引入ERNIE和残差双层BiGRU网络来对文本特征进行更深入的学习,从而提高模型的分类效果。实验结果表明,该文提出的SEB-GCN模型在四个新闻数据集上,分类精确度对比其他模型分别提高4.77%、4.4%、4.8%、3.4%、3%,且分类收敛速度也明显快于其他模型。

Abstract

Graph convolutional neural network (GCN) has been widely used in text classification tasks., which are mostly built by the co-occurrence relationship of words in text classification. To capture the semantic relationship and syntactic relationship in a text, this paper proposes a text classification model SEB-GCN that introduce syntactic text graph and semantic text graph on the basis of text word co-occurrence graph. It then adopts ERNIE and residual bi-layer BiGRU network to capture text features. Experimental results show that the classification accuracy of the proposed SEB-GCN model is superior to other models on four news datasets.

关键词

文本分类 / 图卷积神经网络 / 语义文本图 / 句法文本图 / 残差

Key words

text classification / graphconvolutional neural networks / semantic text graph / syntactic text graph / residuals

引用本文

导出引用
孙红,陆欣荣,徐广辉,黄雪阳,任丽博. 融合语义和句法依存分析的图卷积新闻文本分类. 中文信息学报. 2023, 37(7): 91-101
SUN Hong, LU Xinrong, XU Guanghui, HUANG Xueyang, REN Libo. Graph Convolution Approach to News Text Classification Combing Semantic Relation and Syntactic Dependency. Journal of Chinese Information Processing. 2023, 37(7): 91-101

参考文献

[1] JOACHIMS T. Text categorization with support vector machines: Learning with many relevant features[C]//Proceedings of the European Conference on Machine Learning. Springer, Berlin, Heidelberg, 1998: 137-142.
[2] 潘忠英. 朴素贝叶斯中文文本分类器的设计与实现[J].电脑编程技巧与维护,2021(02):37-39,70.
[3] KIM Y. Convolutional neural network for sentence classification[C]//Proceedings of the EMNLP, 2014:1746-1751.
[4] ZHANG H, XIAO L, WANG Y, et al. A generalized recurrent neural architecture for text classification with multi-task learning[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017: 3385-3391.
[5] GENFFREY E. H. Dynamic routing between Capsules [C]//Proceedings of the 31st Conference on Neural Information Processing Systems, 2017: 1-11.
[6] GRAVES A. Long short-term memory[J]. Supervised Sequence Labelling with Recurrent Neural Networks, 2012: 37-45.
[7] CHO K, VAN MERRINBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of the EMNLP,2014:1724-1734.
[8] BATTAGLIA P W, HAMRICK J B, BAPST V, et al. Relational inductive biases, deep learning, and graph networks[J]. arXiv preprint arXiv:1806.01261, 2018.
[9] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[C]//Proceedings of the ICLR, 2017:1-14.
[10] ROUSSEAU F, KIAGIAS E, VAZIRGIANNIS M. Text categorization as a graph classification problem[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 2015: 1702-1712.
[11] LUO Y, UZUNER , SZOLOVITS P. Bridging semantics and syntax with graph algorithms: State-of-the-art of extracting biomedical relations[J]. Briefings in Bioinformatics, 2017, 18(1): 160-178.
[12] TANG D, QIN B, LIU T. Document modeling with gated recurrent neural network for sentiment classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2015: 1422-1432.
[13] YANG Z, YANG D, DYER C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 1480-1489.
[14] WANG Y, HUANG M, ZHU X, et al. Attention-based LSTM for aspect-level sentiment classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2016: 606-615.
[15] DONG Y, LIU P, ZHU Z, et al. A fusion model-based label embedding and self-interaction attention for text classification[J]. IEEE Access, 2019, 8: 30548-30559.
[16] DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the NAACL,2019:4171-4186.
[17] PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]//Proceedings of the NAACL, 2018:2227-2237.
[18] SUN Y,WANG S,LI Y,et al. ERNIE: Enhanced representation through knowledge integration[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Stroudsburg, PA: Association for Computational Linguistics, 2019:1441-1451.
[19] VELICˇKOVIC' P, CUCURULL G, CASANOVA A, et al. Graph attention networks[C]//Proceedings of the IUR,2018:1-12.
[20] XU K, BA J, KIROS R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]//Proceedings of the International Conference on Machine Learning. PMLR, 2015: 2048-2057.
[21] YAO L, MAO C, LUO Y. Graph convolutional networks for text classification[C]//Proceedings of the
AAAI Conference on Artificial Intelligence, 2019, 33(01): 7370-7377.
[22] ZHENBO B, SHIYOU Z, HONGJUN P, et al. A survey of preprocessing methods for marine Ship Target Detection based on video surveillance[C]//Proceedings of the 7th International Conference on Computing and Artificial Intelligence, 2021: 1-7.
[23] LU Z, DU P, NIE J Y. VGCN-BERT: Augmenting BERT with graph embedding for text classification[C]//Proceedings of the European Conference on Information Retrieval. Springer, Cham, 2020: 369-382.
[24] 靳义林. 基于多模型融合的新闻文本分类研究[D].重庆: 重庆邮电大学硕士学位论文,2019.
[25] 高旭洋. 基于ERNIE和TextGCN的文本分类研究与实现[D]. 开封: 河南大学硕士学位论文,2022.
[26] LIU X,YOU X,ZHANG X,et al.Tensor graph convolutional networks for text classification[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence, 2020: 8409-8416.

基金

上海市自然科学基金(21ZR1450200)
PDF(3288 KB)

Accesses

Citation

Detail

段落导航
相关文章

/