Google Search Result Classification Based on Pre-training
ZHANG Enwei 1,2,3, HU Kai1,3, ZHUO Junjie2,CHEN Zhili2
Author information+
1.School of Automation, Nanjing University of Information Science & Technology, Nanjing, Jiangsu 210044; 2.Transsion, Department of AI Technology, Shanghai 201203; 3.Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, Nanjing, Jiangsu 210044
The preliminary judgment of the results returned by the search engine is of substantial significance to optimizing the search process. As a dominant search engine, Google often returns very complex results, for which there is no effective way to make accurate judgments on the results of search pages. This paper first constructs a data set suitable for Google search result classification, and then, proposes a dual-channel model (DCFE) based on the pre-training model to determine the Google search results. The accuracy of our model on the self-built dataset reach 85.74%, which has higher accuracy the existing models.
ZHANG Enwei, HU Kai, ZHUO Junjie,CHEN Zhili.
Google Search Result Classification Based on Pre-training. Journal of Chinese Information Processing. 2024, 38(3): 102-112
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 黄晓斌, 邱明辉. 网络信息过滤系统研究[J]. 情报学报, 2004, 23(3): 326-332. [2] 谢丽星, 周明, 孙茂松. 基于层次结构的多策略中文微博情感分析和特征抽取[J]. 中文信息学报, 2012, 26(1): 73-84. [3] 赵军, 刘康, 周光有, 等. 开放式文本信息抽取[J]. 中文信息学报, 2011, 25(6): 98-111. [4] 马萌, 金鹏. 浅析网站推广的搜索引擎优化[J]. 黑龙江对外经贸, 2008 (4): 102-103. [5] HAWKING D,CRASWELL N, THISTLEWAITE P, et al. Results and challenges in web search evaluation[J]. Computer Networks, 1999, 31(11-16): 1321-1330. [6] CHO J, ROY S, ADAMS R E. Page quality: In search of an unbiased web ranking[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data, 2005: 551-562. [7] 肖自乾,陈经优,符天. 基于多项式朴素贝叶斯的文本分类及应用研究[J]. 电脑知识与技术,2022,18(27):61-63. [8] 田苗苗. 基于决策树的文本分类研究[J]. 吉林师范大学学报(自然科学版), 2008, 29(1): 54-56. [9] JOACHIMS T. Text categorization with suport vector machines: Learning with many relevant features[C]//Proceedings of the 10th European Conference on Machine Learning: ECML-98,Lecture Notes in Computer Science, 2005: 137-142. [10] WU J. Introduction to convolutional neural networks[J]. National Key Lab for Novel Software Technology. Nanjing University. China, 2017, 5(23): 495. [11] MEDSKER L R, JAIN L C. Recurrent neural networks[J]. Design and Applications, 2001,5(2):64-67. [12] SHI X, CHEN Z, WANG H, et al. Convolutional LSTM network: A machine learning approach for precipitation nowcasting[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015:802-810. [13] 黄磊, 杜昌顺. 基于递归神经网络的文本分类研究[J]. 北京化工大学学报(自然科学版), 2017, 44(01): 98-104. [14] ZHAO W, YE J, YANG M, et al. Investigating capsule networks with dynamic routing for text classification[J].arXiv preprint arXiv:1804.00538, 2018. [15] QIN L, CHE W, LI Y, et al. Dcr-net: A deep co-interactive relation network for joint dialog act recognition and sentiment classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(05): 8665-8672. [16] DENG Z, PENG H, HE D,et al. HTCInfoMax: A global model for hierarchical text classification via information maximization[J]. arXiv Preprine arXiv:2104.05220,2021. [17] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv Preprine arXiv:1810.04805,2018. [18] CROCE D, CASTELLUCCI G, BASILI R. GAN-BERT: Generative adversarial learning for robust text classification with a bunch of labeled examples[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020:2114-2119. [19] JIN D, JIN Z, ZHOU J T, et al. Is bert really robust?: A strong baseline for natural language attack on text classification and entailment[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(05): 8018-8025. [20] CHUNG J, GULCEHRE C, CHO K H,et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[J]. arXiv Preprine arXiv:1412.3555,2014. [21] CHEN J, XIA M, WANG D, et al. Double branch parallel network for segmentation of buildings and waters in remote sensing images[J]. Remote Sensing, 2023, 15(6): 1536. [22] HU K, ZHANG E, XIA M, et al. Mcanet: A multi-branch network for cloud/snow segmentation in high-resolution remote sensing images[J]. Remote Sensing, 2023, 15(4): 1055. [23] ZHANG E, HU K, XIA M, et al. Multilevel feature context semantic fusion network for cloud and cloud shadow segmentation[J]. Journal of Applied Remote Sensing, 2022, 16(4): 046503-046503. [24] CHEN J, ZHOU Y, GE J. Inspection text classification of power equipment based on textCNN[M/OL]//Lecture Notes in Electrical Engineering,The proceedings of the 16th Annual Conference of China Electrotechnical Society, 2022: 390-398. [25] ZHANG X, ZHAO J, LECUN Y. Character-level convolutional networks for text classification[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015: 649-657. [26] LIU J, CHANG W C, WU Y, et al. Deep learning for extreme multi-label text classification[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017: 115-124. [27] 刘龙飞, 杨亮, 张绍武, 等. 基于卷积神经网络的微博情感倾向性分析[J]. 中文信息学报, 2015, 29(6): 159-165. [28] LIU P, QIU X, HUANG X. Recurrent neural network for text classification with multi-task learning[J].arXiv Preprint arXiv.1605.05101,2016. [29] JOHNSON R, ZHANG T. Supervised and semi-supervised text categorization using LSTM for region embeddings[C]//Proceedings of the International Conference on Machine Learning, 2016: 526-534. [30] TAI K S, SOCHER R, MANNING C D. Improved semantic representations from tree-structured long short-term memory networks[J].arXiv Preprint arXiv:1503.00075, 2015. [31] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[J]. arXiv Preprint arXiv:1409.0473,2014. [32] ALSHUBAILY I. Text cnn with attention for text classification[J]. arXiv Preprint arXiv:2108.01921,2021. [33] YANG Z, YANG D, DYER C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Lin-guistics: Human Language Technologies, 2016: 1480-1489. [34] JAWAHAR G, SAGOT B, DJAM S. What does BERT learn about the structure of language?[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019:3651-3657. [35] 刘勇,兴艳云.基于改进随机森林算法的文本分类研究与应用[J].计算机系统应用, 2019, 28(5):6-17. [36] 寇莎莎, 魏振军. K-最近邻的改进及其在文本分类中的应用[J]. 河南师范大学学报(自然科学版), 2005, 33(3): 134-136. [37] TAUD H, MAS J F. Multilayer perceptron (MLP)[M]//Geomatic Approaches for Modeling Land Change Scenarios, Spriner, 2018: 451-455. [38] KIM Y .Convolutional neural networks for sentence classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014: 1746-1751. [39] SAMATIN N A N, ZHAO H. Chartec-net: An efficient and lightweight character-based convolutional network for text classification[J]. Journal of Electrical and Computer Engineering, 2020, 2020:1-7. [40] LU Z, DU P, NIE J Y. VGCN-BERT: Augmenting BERT with graph embedding for text classification[C]//Proceedings of the 42nd European Conference on IR Research, Springer International Publishing, 2020: 369-382. [41] VASWANI A,SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017:6000-6010.