基于预训练语言模型的政策识别研究

朱娜娜,王航,张家乐,孙英巍

PDF(4746 KB)
PDF(4746 KB)
中文信息学报 ›› 2022, Vol. 36 ›› Issue (2) : 104-110.
信息抽取与文本挖掘

基于预训练语言模型的政策识别研究

  • 朱娜娜1,3,王航2,张家乐2,孙英巍4
作者信息 +

Policy Identification Based on Pretrained Language Model

  • ZHU Nana1,3, WANG Hang2, ZHANG Jiale2, SUN Yingwei4
Author information +
History +

摘要

政策文本的量化研究近年来受到了政策研究学者的广泛关注,其研究结论以客观数据为依据,在很大程度上可以克服以往对政策定性分析的主观性和随机性。已有定量政策文本分析方法主要存在两方面的不足: 一方面,对于政策文本的采集主要依靠手工收集,其数据规模较小;另一方面,在政策识别方面主要依靠人类经验,在小规模数据集上进行偏置归纳。针对以上问题,该文提出基于预训练语言模型的政策识别方法,从而克服以上问题,在较大规模的政策文本数据集上取得了较好的效果。

Abstract

Quantitative study on policy text is attractive in that the conclusions obtained by quantitative approaches can overcome the subjectivity and randomness of qualitative analysis approaches. Existing quantitative approaches on policy text analysis have two drawbacks. First, the data size is small due to the manually collecting of policy text. Second, the identification of policy text mainly depends on the human experience, which is obtained on biased induction on small data. To address the above issues, this paper proposed a pretrained language model approach for policy identification so that to overcome the above problems and achieve well performance on large-scale policy data set.

关键词

预训练 / 语言模型 / 政策识别

Key words

pretraining / language model / policy identification

引用本文

导出引用
朱娜娜,王航,张家乐,孙英巍. 基于预训练语言模型的政策识别研究. 中文信息学报. 2022, 36(2): 104-110
ZHU Nana, WANG Hang, ZHANG Jiale, SUN Yingwei. Policy Identification Based on Pretrained Language Model. Journal of Chinese Information Processing. 2022, 36(2): 104-110

参考文献

[1] 裴雷,孙建军,周兆韬. 政策文本计算:一种新的政策文本解读方式[J]. 图书与情报, 2016, 172(6): 47-55.
[2] 彭纪生,孙文祥,仲为国. 中国技术创新政策演变与绩效实证研究:1978—2006[J]. 科研管理, 2008(4): 134-150.
[3] 彭纪生,仲为国,孙文祥. 政策测量、政策协同演变与经济绩效:基于创新政策的实证研究[J]. 管理世界, 2008(9): 25-36.
[4] 李江,刘源浩,黄萃,等.用文献计量研究重塑政策文本数据分析: 政策文献计量的起源、迁移与方法创新[J].公共管理学报,2015,12(02):138-144.
[5] 黄萃,任弢,张剑. 政策文献量化研究:公共政策研究的新方向[J]. 公共管理学报, 2015, 12(2): 129-137, 158-159.
[6] 张剑,黄萃,叶选挺,等.中国公共政策扩散的文献量化研究: 以科技成果转化政策为例[J].中国软科学,2016(02):145-155.
[7] 黄萃,任弢,李江,等.责任与利益:基于政策文献量化分析的中国科技创新政策府际合作关系演进研究[J].管理世界,2015(12):68-81.
[8] 孙斌,彭纪生.中国知识产权保护政策与创新政策的协同演变研究[J].科技管理研究,2010,30(01):33-35.
[9] 肖久灵,孙文祥,彭纪生.地方政府技术政策演化与绩效研究:以江苏省为例[J].中国科技论坛,2009(11):72-76.
[10] 仲为国,彭纪生,孙文祥.政策测量、政策协同与技术绩效:基于中国创新政策的实证研究(1978—2006)[J].科学学与科学技术管理,2009,30(03):54-60.
[11] 张国兴,叶亚琼,管欣,等.京津冀节能减排政策措施的差异与协同研究[J].管理科学学报,2018,21(05):111-126.
[12] 张国兴,李佳雪,胡毅,等.节能减排科技政策的演变及协同有效性:基于211条节能减排科技政策的研究[J].管理评论,2017,29(12):72-83.
[13] 李佳雪,张国兴,胡毅,等.节能减排政策制定部门的协同有效性:基于1195条节能减排政策的研究[J].系统工程理论与实践,2017,37(06):1499-1511.
[14] Aggarwal C C, Zhai C X. A survey of text classification algorithms [J].Mining Text Data. Springer, Boston, MA, 2012: 163-222.
[15] 庞观松,蒋盛益.文本自动分类技术研究综述[J].情报理论与实践,2012,35(02):123-128.
[16] Li Y H, Jain A K. Classification of text documents[J]. The Computer Journal, 1998, 41(8): 537-546.
[17] Dumais S, Platt J, Heckerman D, et al. Inductive learning algorithms and representations for text categorization[C]//Proceedings of the 7th International Conference on Information and Knowledge Management. ACM, 1998: 148-155.
[18] Freund Y, Schapire R E. A decision-theoretic generalization of online learning and an application to boosting[J]. Journal of Computer and System Sciences, 1997, 55(1): 119-139.
[19] Hinton G E,et. Reducing the dimensionality of data with neural networks[J]. Science, 2006,313(5786):504-507.
[20] BengioY, Courville A, Vincent P, et al. Representation learning: a review and new perspectives[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8):1798-1828.
[21] Collobert R, Weston J, Bottou L, et al. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12(1):2493-2537.
[22] Huang E, Socher R, Manning C, et al. Improving word representations via global context and multiple word prototypes[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2012.
[23] Mikolov Tomas, Ilya Sutskever, Kai Chen, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems, 2013: 3111-3119.
[24] Socher R, Huang E H, Pennington J, et al. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection[C]//Proceedings of the 24th International Conference on neural information processing systems, 2011, 24:801-809.
[25] Socher Richard, Jeffrey Pennington, Eric H Huang. et al. Semi-supervised recursive autoencoders for predicting sentiment distributions[C]//Proceedings of the conference on empirical methods in natural language processing, 2011: 151-161.
[26] Socher Richard, Alex Perelygin, Jean Wu, et al. Recursive deep models for semantic compositionality over a sentiment treebank[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2013: 1631-1642.
[27] Blunsom Phil, Nal Kalchbrenner. Recurrent convolutional neural networks for discourse compositionality[C]//Proceedings of the Workshop on Continuous Vector Space Models and Their Compositionality, 2013.
[28] Aghakhani H, Machiry A, Nilizadeh S, et al. Detecting deceptive reviews using generative adversarial networks[C]//Proceedings of the IEEE Security and Privacy Workshops, 2018.
[29] Y Yang Z, Yang D, Dyer C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 1480-1489.
[30] Kowsari K, Brown D E, Heidarysafa M, et al. HDLTex: hierarchical deep learning for text classification[C]//Proceedings of the IEEE International Conference on Machine Learning and Applications, 2017: 364-371.
[31] Liu J, Chang W C, Wu Y, et al.Deep learning for extreme multi-label text classification[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017: 115-124.
[32] Vaswani A, Shazeer N, Parmar, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems,2017:5998-6008.

基金

国家社会科学基金(15ATQ008);黑龙江省文化厅艺术科学规划项目(2019C027)
PDF(4746 KB)

1699

Accesses

0

Citation

Detail

段落导航
相关文章

/