孙新,唐正,赵永妍,张颖捷. 基于层次混合注意力机制的文本分类模型[J]. 中文信息学报, 2021, 35(2): 69-77.
SUN Xin, TANG Zheng, ZHAO Yongyan, ZHANG Yingjie. Hierarchical Networks with Mixed Attention for Text Classification. , 2021, 35(2): 69-77.
基于层次混合注意力机制的文本分类模型
孙新,唐正,赵永妍,张颖捷
北京市海量语言信息处理与云计算应用工程技术研究中心 北京理工大学 计算机学院,北京 100081
Hierarchical Networks with Mixed Attention for Text Classification
SUN Xin, TANG Zheng, ZHAO Yongyan, ZHANG Yingjie
Beijing Engineering Applications Research Center on High Volume Language Information Processing and Cloud Computing, School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
Abstract:Text classification is one of the core tasks in the field of natural language processing. To address the long text sequence, we propose the Hierarchical Networks with Mixed Attention (HMAN) model for text classification to capture the important parts of the text based on the hierarchical model. First, the sentences and documents are encoded according to the hierarchical structure of documents, and attention mechanism is applied at each level. Then, global target vectors, sentence specific target vectors are extracted by max-pooling to encode the document vectors. Finally, documents are classified according to the constructed document representation. Experimental results on the open datasets and industry text datasets show that the model has better classification performance, especially for long text with hierarchical structure.
[1] Lewis D D. Naive (Bayes) at forty: The independence assumption in information retrieval [M]. Berlin Heidelberg: Springer, 1998: 4-15. [2] Cover T, Hart P. Nearest neighbor pattern classification [J]. IEEE Transactions on Information Theory, 2003, 13(1): 21-27. [3] Cortes C,Vapnik V. Support-vector networks [J]. Machine Learning, 1995, 20(3): 273-297. [4] 周飞燕, 金林鹏, 董军. 卷积神经网络研究综述[J]. 计算机学报, 2017(6): 1229-1251. [5] Yoon Kim . Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014: 1746-1751. [6] Liu Pengfei, Qiu Xipeng, Huang Xuanjing. Recurrent neural network for text classification with multi-task learning[C]//Proceedings of the International Joint Conferences on Artificial Intelligence, 2016: 2873-2879. [7] Tang Duyu, Qin Bing, Liu Ting. Document modeling with gated recurrent neural network for sentiment classification[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015: 1422-1432. [8] Yang Zichao, Yang Diyi, Dyer Chris. Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 1480-1489. [9] 程艳,叶子铭,王明文等.融合卷积神经网络与层次化注意力网络的中文文本情感倾向性分析[J].中文信息学报,2019,33(1): 133-142. [10] XiaoY , Cho K . Efficient Character-level document classification by combining convolution and recurrent layers [J/OL]. arXiv preprint arXiv: 1602.00367, 2016. [11] Conneau A , Schwenk H , Barrault, Loc, et al. Very deep convolutional networks for text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2016: 1107-1116. [12] Rie Johnson, Tong Zhang. Deep pyramid convolutional neural networks for text categorization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 562-570. [13] Xu Jiacheng, Chen Danlu, Qiu Xipeng, et al. Cached long short term memory neural networks for document-level sentiment classification[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016: 1660-1669. [14] Mnih V , Heess N , Graves A , et al. Recurrent models of visual attention[J]. Advances in Neural Information Processing Systems, 2014(3): 2204-2212. [15] Bahdanau D , Cho K , Bengio Y . Neural machine translation by jointly learning to align and translate [C]//Proceedings of the 2015 International Conference on Learning Representations, 2015: 1-15. [16] Quinlan J R. Learning efficient classification procedures and their application to chess end games[M]. Machine Learning: An Artificial Intelligence Approach, 1983(1): 463-482. [17] McCallum A, Nigam K. A comparison of event models for naive Bayes textclassification[G]//Proceedings of AAAI-98 Workshop on Learning for Text Categorization, 1998, 752(1): 41-48. [18] Breiman L, Friedman J, Stone J C, et al. Classification and regression trees [M]. London: Chapman and Hall/CRC.1984. [19] Zhang X, Zhao J,LeCun Y. Character level convolutional networks for text classification[C]//Proceedings of the 2015 Conference and Workshop on Neural Information Processing Systems, 2015: 649-657.