判定商品税率以便进行税收是海关最重要的职能之一,其肩负着国家财政收入与再分配的重要作用。预训练语言模型(BERT)的提出,刷新了众多自然语言处理任务的最优结果。但是由于海关文本数据的特殊性,导致BERT在海关进出口商品税率检测(文本分类)任务中表现较差。针对这个问题,该文提出一种新的预训练语言模型CC-BERT。在模型预训练阶段,提出了全要素掩蔽策略以及预测规格型号与申报要素是否对齐(NCA)两种全新的预训练策略。CC-BERT可以降低既定文本顺序对模型性能的负反馈,以及加强规格型号与申报要素之间的联系。在真实的海关税率检测任务上的实验结果表明,该方法具有更强的鲁棒性,分类性能指标优于基线模型,F1值在两个数据集上分别达到90.52%和80.10%。
Abstract
The pre-trained language model BERT has updated the optimal results of various natural language processing tasks, but fails in the tariff rate detection via text classification for China Customs. To better capture the peculiarities of customs texts, we propose an imporved pre-training language model named CC-BERT via the full-factor masking strategy and a new task of predicting the alignment between specification type and declaration element (NCA). CC-BERT can reduce the negative feedback of the given text order as well as strengthen the connection between specification type and declaration element. Experimental results on two real customs tariff detection tasks show that the method achieves 90.52% and 80.10% F1 values, respectively.
关键词
预训练语言模型 /
税率检测 /
结构化文本
{{custom_keyword}} /
Key words
pre-trained language model /
tax rate detection /
structured text
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] KRIZHEVSKY A, SUTSKEVER I, HINTON G. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the Conference and Workshop on Neural Information Processing Systems, 2012: 1097-1105.
[2] LIU P,QIU X, HUANG X. Recurrent neural network for text classification with multi-task learning[C]//Proceedings of the International Joint Conference on Artificial Intelligence, 2016: 2873-2879.
[3] VASWANI A,SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 6000-6010.
[4] 周欣, 张弛海. 基于数据挖掘的海关风险分类预测模型研究[J]. 海关与经贸研究, 2017, 38(02): 22-31.
[5] KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014: 1746-1751.
[6] JOHNSON R, ZHANG T. Semi-supervised convolutional neural networks for text categorization via region embedding[J]. Advances in Neural Information Processing Systems, 2015, 28(5): 919-927.
[7] JOHNSON R, ZHANG T. Deep pyramid convolutional neural networks for text categorization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 562-570.
[8] LIU P,QIU X, HUANG X. Recurrent neural network for text classification with multi-task learning[C]//Proceedings of the International Joint Conference on Artificial Intelligence, 2016: 2873-2879.
[9] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[10] TAI K S,SOCHER R, MANNING C D. Improved semantic representations from tree-structured long short-term memory networks[J]. Computer Science, 2015, 5(1): 25-36.
[11] GARG S, VU T,MOSCHITTI A. TANDA: Transfer and adapt pre-trained transformer models for answer sentence selection[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 7780-7788.
[12] MCCANN B, KESKAR N S, XIONG C, et al. The natural language decathlon: Multitask learning as question answering[J/OL]. arXiv preprint arXiv: 1806.08730, 2019.
[13] CONNEAU A, KIELA D, SCHWENK H, et al. Supervised learning of universal sentence representations from natural language inference data[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017: 670-680.
[14] WANG A, SINGH A, MICHAEL J, et al. GLUE: A multi-task benchmark and analysis platform for natural language understanding[C]//Proceedings of the Emnlp Workshop Blackboxnlp: Analyzing & Interpreting Neural Networks for Nlp, 2018: 670-680.
[15] HOWARD J, RUDER S. Universal language model fine-tuning for text classification[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 328-339.
[16] 傅群超, 王枞. 用于文本分类的多探测任务语言模型微调[J]. 北京邮电大学学报, 2019, 42(06): 76-83.
[17] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the North American Chapter of the Association for Computational Linguistics, 2018: 4171-4186.
[18] CUI Y, CHE W, LIU T, et al. Revisiting pre-trained models for Chinese natural language processing[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(11): 1-21.
[19] HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[J]. Computer Science, 2015, 14(2): 126-136.
[20] LIU Y, OTT M, GOYAL N, et al. RoBERTa: A robustly optimized BERT pretraining approach[J/OL]. arXiv preprint arXiv: 1907.11692, 2020.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
国家重点研究与发展计划(2018YFC0910500);国家自然科学基金(61425002,61751203,61772100,62076045)
{{custom_fund}}