车蕾,杨小平,王良,梁天新,韩镇远. 面向文本结构的混合分层注意力网络的话题归类[J]. 中文信息学报, 2019, 33(5): 93-102,112.
CHE Lei, YANG Xiaoping, WANG Liang, LIANG Tianxin, HAN Zhenyuan. Text Structure Oriented Hybrid Hierarchical Attention Networks for Topic Classification. , 2019, 33(5): 93-102,112.
Text Structure Oriented Hybrid Hierarchical Attention Networks for Topic Classification
CHE Lei1,2, YANG Xiaoping1, WANG Liang1, LIANG Tianxin1, HAN Zhenyuan1
1.School of Information, Renmin University of China, Beijing 100872, China; 2.School of Information Management, Beijing Information Science & Technology University, Beijing 100192, China
Abstract:To better utilize text logical structure features and text organizational structure features in topic classification, this paper proposes a text structure oriented hybrid hierarchical attention network for this task. The logical structure usually includes information such as title and text, and the organizational structure includes character-word-sentence layer. The model integrates text headings and text bodies to improve the role of logical structure features in topic classification, and improves the role of text organizational structure features in topic classification based on the attention mechanism of char-sentence and word-sentence levels. Experimental results on 4 datasets show that the proposed model can improve the accuracy of topic classification tasks.
[1] Liu N.Topic detection and tracking.In: Encyclopedia of Database Systems[M].Springer,New York,NY,2016. [2] Uysal A K,Gunal S.The impact of preprocessing on text classification[J].Information Processing & Management,2014,50(1):104-112. [3] Li J T,Cao Y M,Wang Y D,et al.Online learning algorithms for double-weighted least squares twin bounded support vector machines [J].Neural Processing Letters,2017,45(1):319-339. [4] Hinton G E,Osindero S,Teh Y W.A fast learning algorithm for deep belief nets[J].Neural Computaion,2006,18:1527-1554. [5] Kim Y.Convolutional neural networks for sentence classification[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing,2014:1746-1751. [6] Hochreiter S,Schmidhuber J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [7] Chung J,Gulcehre C,Cho K H,et al.Empirical evaluation of gated recurrent networks on sequence modeling[J].arXiv preprint arXiv: 1412.3555,2014. [8] 栾克鑫,杜新凯,孙承杰,等.基于注意力机制的句子排序方法[J].中文信息学报,2018,32(1):124-130. [9] Yang Z,Yang D,Dyer C,et al.Hierarchical attention networks for document classification[C]//Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2016:1480-1489. [10] Wang Y,Wang S,Tang J,et al.Hierarchical attention network for action recognition in videos[J].arXiv preprint arXiv: 1607.06416,2016. [11] Wang Y,Shen F,Elayavilli R K,et al.Entity-enhanced hierarchical attention neural networks for mining protein interactions from biomedical texts[C]//Proceedings of BioCreative VI Challenge and Workshop,2017. [12] Gao S,Young M T,Qiu J X,et al.Hierarchical attention networks for information extraction from cancer pathology reports[J].Journal of the American Medical Informatics Association Jamia, 2018,25(3):321-330. [13] Yan S,Smith J S,Lu W,et al.Hierarchical multi-scale attention networks for action recognition[J].Signal Processing Image Communication,2017,61:73-84. [14] Zhou Y J,Xu J M,Gao,J,et al.Hybrid attention networks for Chinese short text classification[J].Computacion Y Sistemas, 2017,21(4):759-769. [15] Pappas N,Popescubelis A.Multilingual hierarchical attention networks for document classification[J].arXiv preprint arXiv: 1707.00896,2017. [16] Tarnpradab S,Liu F,Hua K A.Toward extractive summarization of online forum discussions via hierarchical attention networks[C]//Proceedings of FLAIRS 2017 — Proceedings of the 30th International Florida Artificial Intelligence Research Society Conference,Palo Alto: AAAI Press,2017:288-292. [17] Minh-Thang Luong,et al.Achieving open vocabulary neural machine translation with hybrid word-character models.[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,Berlin,Germany,August 7-12,2016:1054-1063. [18] Xuezhe Ma,Eduard Hovy.End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics,Berlin,Germany,August 7-12,2016:1064-1074. [19] Hinton G E.Learning distributed representations of concepts[C]//Proceedings of the 8th Annual Conference of the Cognitive Science Society,1986:1-12. [20] Yoshua Bengio.A neural probabilistic language model[J].Journal of Machine Learning Reseach.2003(3):1137-1155. [21] Mikolov T,Chen K,Corrado G,et al.Efficient estimation of word representations in vector space[J].ICLR Workshop,2013. [22] Mikolov T, Sutskever I,Chen K,et al.Distributed representations of words and phrases and their compositionality[J].Advances in Neural Information Processing Systems,2013,26:3111-3119. [23] 孙晓,何家劲,任福继.基于多特征融合的混合神经网络模型讽刺语用判别[J].中文信息学报,2016,30(6):215-223. [24] Schwenk H.Continuous space language models[J].Computer Speech and Language,2007,21(3):492-518. [25] Krogh A,Vedelsby J.Neural network ensembles,cross validation,and active learning[C]//Proceedings of Neural Information Processing Systems,1995:231-238. [26] 叶敏,汤世平,牛振东.一种基于多特征因子改进的中文文本分类算法[J].中文信息学报,2017,31(4):132-137. [27] http://thuctc.thunlp.org/[DB/OL]. [28] Wang Y R,Tian F.Recurrent residual learning for sequence classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Cambridge,MA: MIT Press,2016:938-943. [29] Paul S,Magdon-Ismail M,Drineas P.Feature selection for linear SVM with provable guarantees[J].Pattern Recognition,2016:205-214. [30] Dahl G E,Sainath T N,Hinton G E.Improving deep neural networks for LVCSR using rectified linear units and dropout[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.IEEE,2013:8609-8613.