基于预训练语言模型的BERT-CNN多层级专利分类研究

陆晓蕾,倪斌

PDF(12690 KB)
PDF(12690 KB)
中文信息学报 ›› 2021, Vol. 35 ›› Issue (11) : 70-79.
信息抽取与文本挖掘

基于预训练语言模型的BERT-CNN多层级专利分类研究

  • 陆晓蕾1,倪斌2
作者信息 +

BERT-CNN: A Hierarchical Patent Classifier Based on Pre-trained Language Model

  • LU Xiaolei1, NI Bin2
Author information +
History +

摘要

专利文献的自动分类对于知识产权保护、专利管理和专利信息检索十分重要,构建准确的专利自动分类器可以为专利发明人、专利审查员提供辅助支持。该文以专利文献分类为研究任务,选取国家信息中心公布的全国专利申请信息为实验数据,提出了基于预训练语言模型的BERT-CNN多层级专利分类模型。实验结果表明: 在该数据集上,BERT-CNN模型在准确率上达到了84.3%,大幅度领先于卷积神经网络和循环神经网络等其他深度学习算法。BERT抽取的特征向量在表达词汇与语义方面比传统Word2Vec具有更加强大的性能。另外,该文还探讨了全局与局部策略在专利多层文本分类上的差异。

Abstract

An accurate automatic patent classifier is crucial to patent inventors and patent examiners, and is of potential application in the fields of intellectual property protection, patent management, and patent information retrieval. This paper presents BERT-CNN, a hierarchical patent classifier based on pre-trained language model, which is trained by the national patent application documents collected from the State Information Center, China. The experimental results show that the proposed method achieves 84.3% accuracy, much better than the two compared baseline methods, Convolutional Neural Networks and Recurrent Neural Networks. In addition, this article also discusses the differences between hierarchical and flat strategies in multi-layer text classification.

关键词

专利 / 文本分类 / BERT

Key words

patent / text classification / BERT

引用本文

导出引用
陆晓蕾,倪斌. 基于预训练语言模型的BERT-CNN多层级专利分类研究. 中文信息学报. 2021, 35(11): 70-79
LU Xiaolei, NI Bin. BERT-CNN: A Hierarchical Patent Classifier Based on Pre-trained Language Model. Journal of Chinese Information Processing. 2021, 35(11): 70-79

参考文献

[1] World Intellectual Property Organization. World Intellectual Property Indicators[R]. Geneva: WIPO, 2018.
[2] Mccallum A, Nigam K. A comparison of event models for Naive Bayes text classification[C]//Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, 1998, 752(1): 41-48.
[3] Joachims T. Transductive inference for text classification using support vector machines[C]//Proceedings of the ICML, 1999, 99: 200-209.
[4] Hastie T, Tibshirani R. Discriminant adaptive nearest neighbor classification and regression[C]//Proceedings of NIPS, 1996: 409-415.
[5] Fall C J, TrcsvRi A, Benzineb K, et al. Automated categorization in the international patent classification[J].Acm Sigir Forum. ACM, 2003, 37(1): 10-25.
[6] D‘Hondt E, Verberne S, Koster C, et al. Text representations for patent classification[J]. Computational Linguistics, 2013, 39(3): 755-775.
[7] 姜春涛. 运用图示法自动提取中文专利文本的语义信息[J]. 图书情报工作, 2015, 59(21): 115-122.
[8] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of NIPS, 2013: 3111-3119.
[9] Grawe M F, Martins C A, Bonfante A G. Automated patent classification using word embedding[C]//Proceedings of the 16th IEEE International Conference on Machine Learning and Applications. IEEE, 2017: 408-411.
[10] Shalaby M, Stutzki J, Schubert M, et al. An LSTM approach to patent classification based on fixed hierarchy vectors[C]//Proceedings of the SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2018: 495-503.
[11] Pennington Jeffrey, Richard Socher, Christopher Manning. Glove: Global vectors for word representation[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014.
[12] Bengio Y, Ducharme R, Vincent P, et al. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, 3(2): 1137-1155.
[13] Peters M E, Neumann M, Iyyer M, et al. Deep contextualized word representations[J]. arXiv preprint arXiv:1802.05365, 2018.
[14] Alec Radford, Karthik Narasimhan, Tim Salimans, et al. Improving language understanding with unsupervised learning[EB/OL]. https://cdn.open ai.com/research-covers/language-unsupervised/language-understanding-paper.pdf. [2020-05-04].
[15] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.
[16] 杨飘, 董文永. 基于BERT嵌入的中文命名实体识别方法[J]. 计算机工程, 2019,46(4):40-52.
[17] 蔡鑫怡, 姜威宇, 韩浪焜,等. BERT在中文阅读理解问答中的应用方法[J]. 信息与电脑(理论版), 2019(08): 39-40.
[18] Joulin A, Grave E, Bojanowski P, et al. Bag of tricks for efficient text classification[J]. arXiv preprint arXiv:1607.01759, 2016.
[19] Bojanowski P, Grave E, Joulin A, et al. Enriching word vectors with subword information[J]. Transactions of the Association for Computational Linguistics, 2017: 135-146.
[20] Vaswani A, Shazeer N, Parmar N, Et Al. Attention is all you need[C]//Proceedings of NIPS 2017, 2017: 5998-6008.
[21] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[22] Partalas I, Kosmopoulos A, Baskiotis N, et al. LSHTC: A benchmark for large-scale text classification[J]. arXiv preprint arXiv:1503.08581, 2015.
[23] 何力, 贾焰, 韩伟红,等. 大规模层次分类问题研究及其进展[J]. 计算机学报, 2012,35(10): 2101-2115.
[24] Babbar R, Partalas I, Gaussier E, et al. On flat versus hierarchical classification in large-scale taxonomies[C]//Proceedings of NIPS, 2013: 1824-1832.
[25] Shen Li, Zhe Zhao, Renfen Hu. Analogical reasoning on Chinese morphological and semantic relations[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 130-143.
[26] Kingma D P, Ba J. Adam: A method for stochastic optimization[J]. arXiv preprint arXiv:1412.6980, 2014.
[27] 孙茂松,李景阳,郭志芃,等. THUCTC: 一个高效的中文文本分类工具包[CP/OL]. http://thuctc.thunlp.org/. [2020-05-04].
[28] 李保利. 基于类别层次结构的多层文本分类样本扩展策略[J]. 北京大学学报 (自然科学版), 2015, 51(2): 357-366.
[29] Vig J. Visualizing attention in transformer-based language models[J]. arXiv preprint arXiv:1904.02679, 2019.

基金

教育部人文社科基金(18YJCZH117);中央高校基本科研项目(20720191053)
PDF(12690 KB)

1233

Accesses

0

Citation

Detail

段落导航
相关文章

/