为了准确识别网络文本中的价值观倾向,该文提出了一种融合标签语义知识实现价值观多标签文本分类的策略。首先基于价值观理论体系,构建了价值观知识图谱;然后构建了价值观多标签文本分类数据集;最后提出了融合标签语义知识的价值观多标签文本分类模型,通过两种方式融合价值观标签的语义知识。其一,利用标签语义信息进行文本表示学习,获得每个标签对于文本中不同词的重要程度;其二,利用标签的语义知识,计算标签与文本的语义相似度,并与分类模型结果融合。实验表明,该方法可以较好地解决价值观多标签分类问题,尤其可以缓解“尾标签”问题,最终在top@1结果上达到62.44%的精确率,在top@3上达到66.92%的召回率。
Abstract
In order to identify value tendencies in web texts, this paper proposes a multi-label text classification strategy combining semantic knowledge of value labels. We first construct a value knowledge graph based on the value theory system, and collect a multi-label text classification value dataset. Then, we propose a multi-label text classification model which combines semantic knowledge of value labels in two ways. One is to use the semantic information of labels for text representation learning to obtain the importance of each label for different words in the text. The other is to use the semantic knowledge of labels to calculate the semantic similarities between labels and texts, and combine them with the classification results. Experiments results show that our method can solve the multi-label classification problem in the value domain, and can also alleviate the “tail-label” problem. The results achieve 62.44% precision on top@1 and 66.92% recall on top@3.
关键词
价值观 /
标签语义 /
知识图谱 /
多标签文本分类
{{custom_keyword}} /
Key words
value /
label semantic /
knowledge graph /
multi-label text classification
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 国家互联网信息办公室.网络信息内容生态治理规定[J]. 中华人民共和国国务院公报, 2020, 08: 46-50.
[2] 中国互联网络信息中心.第48次《中国互联网络发展状况统计报告》发布[J]. 中国广播, 2021(04): 38.
[3] TANG D, QIN B, LIU T. Document modeling with gated recurrent neural network for sentiment classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2015: 1422-1432.
[4] SUN J, ZHU M, JIANG Y, et al. Hierarchical attention model for personalized tag recommendation[J]. Journal of the Association for Information Science and Technology, 2021, 72(2): 173-189.
[5] ZHANG Y, CHEN X. Explainable recommendation: a survey and new perspectives[J]. Foundations and Trends In Information Retrieval, 2020, 14(1): 1-101.
[6] BOUTELL M R, LUO J, SHEN X, et al. Learning multi-label scene classification[J]. Pattern Recognition, 2004, 37(9): 1757-1771.
[7] TSOUMAKAS G, KATAKIS I. Multi-label classification[J]. International Journal of Data Warehousing and Mining, 2007, 3(3): 1-13.
[8] READ J, PFAHRINGER B, HOLMES G, et al. Classifier chains for multi-label classification[J]. Machine Learning,2011, 85: 333-359.
[9] CLARE A, KING R D. Knowledge discovery in multi-label phenotype data[C]// Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge,2001: 42-53.
[10] ANDR E, WESTON J. A kernel method for multi-labelled classification[C]// Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, 2001: 681-687.
[11] ZHANG M, ZHOU Z. ML-KNN: A lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40(7): 2038-2048.
[12] ZHANG M, ZHOU Z. Multilabel neural networks with applications to functional genomics and text categorization[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(10): 1338-1351.
[13] RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by back-propagating errors[J]. Nature, 1986, 323(6088): 533-536.
[14] JING L, WEI Y, CHANG Y, et al. Deep learning for extreme multi-label text classification[C]// Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017: 115-124.
[15] BOUVRIE J. Notes on convolutional neural networks[EB/OL]. http: //web.mit.edu/jvb/www/papers/cnn_tutorial.pdf. [2021-12-24].
[16] KURATA G, XIANG B, ZHOU B. Improved neural network-based multi-label classification with better initialization leveraging label co-occurrence[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 521-526.
[17] CHEN G, YE D, XING Z, et al. Ensemble application of convolutional and recurrent neural networks for multi-label text categorization[C]//Proceedings of the International Joint Conference on Neural Networks, 2017: 2161-4407.
[18] ZAREMBA W, SUTSKEVER I, VINYALS O. Recurrent neural network regularization[C]//Proceedings of International Conference on Learning Representations, 2014.
[19] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[20] WANG G, LI C, WANG W, et al. Joint embedding of words and labels for text classification[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics,2018:2321-2331.
[21] DU C, CHIN Z, FENG F, et al. Explicit interaction model towards text classification[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial, 2019, 33: 6359-6366.
[22] LI S, ZHAO Z, HU R, et al. Analogical reasoning on chinese morphological and semantic relations[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018, 2: 138-143.
[23] KINGMA D P, BA J. Adam: A method for stochastic optimization[C]//Proceedings of the 3rd International Conference for Learning Representations, 2015: 1-15.
[24] JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017: 427-431.
[25] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 5998-6008.
[26] DEVLIN J, CHANG MW, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018, 1: 4171-4186.
[27] LAI S, XU L, LIU K, et al. Recurrent convolutional neural networks for text classification[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence, 2015, 333: 2267-2273.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
传播内容认知国家重点实验室课题(A12002);国家自然科学基金(62176074);中央高校基本科研业务费专项资金(2022FRFK0600XX)
{{custom_fund}}