融入领域术语词典的司法舆情敏感信息识别

张泽锋,毛存礼,余正涛,黄于欣,刘奕洋

PDF(3283 KB)
PDF(3283 KB)
中文信息学报 ›› 2022, Vol. 36 ›› Issue (9) : 76-83,92.
信息抽取与文本挖掘

融入领域术语词典的司法舆情敏感信息识别

  • 张泽锋1,2,毛存礼1,2,余正涛1,2,黄于欣1,2,刘奕洋1,2
作者信息 +

Sensitive Judicial Public Opinion Information Recognition with the Domain Terminology Dictionary

  • ZHANG Zefeng1,2, MAO Cunli1,2, YU Zhengtao1,2, HUANG Yuxin1,2, LIU Yiyang1,2
Author information +
History +

摘要

司法舆情敏感信息识别主要是从海量网络文本中识别出与司法领域相关的敏感舆情。当前,面向司法舆情敏感信息识别的研究较少,相比通用领域的敏感信息识别任务,司法舆情敏感信息具有描述不规范、冗余信息多以及领域词汇过多等特点,这使得通用模型并不适用该任务。为此,该文提出融入领域术语词典的司法舆情敏感信息识别模型。首先使用双向循环神经网络和多头注意力机制对舆情文本进行编码,得到具有权重信息的文本表示;其次将领域术语词典作为分类的指导知识,与舆情文本表征构建相似矩阵,得到融入领域术语词典的司法敏感文本表征;然后利用卷积神经网络对其进行局部信息编码,再利用多头注意力机制获取具有敏感权重的局部特征;最后实现司法领域敏感信息识别。实验结果表明,相比Bi-LSTM Attention基线模型,F1值提升了8%。

Abstract

Currently, there are few researches on sensitive information identification for judicial public opinion, which is challenged by nonstandard descriptions, rich redundant information and numerous domain words. To address these issues, we propose a novel recognition model of judicial public opinion sensitive information via integrating the domain terminology dictionary. Firstly, the bi-directional recurrent neural network and multi-head attention mechanism are used to encode the public opinion text. Secondly, the domain terminology dictionary is used as the guiding knowledge for classification, and a similarity matrix is constructed with the public opinion text representation to derive the judicially sensitive text representation. Furthermore, convolutional neural network is used to encode local information, and multi-head attention mechanism is used to derive the weight aware local features. Finally, the identification of sensitive information in the judicial field is employed. The experimental results show that compared with the Bi-LSTM Attention baseline model, the F1 value increases by 8%.

关键词

司法舆情 / 敏感信息 / 领域术语词典 / 多头注意力机制

Key words

judicial public opinion / sensitive information / domain terminology dictionary / multi-head attention mechanism

引用本文

导出引用
张泽锋,毛存礼,余正涛,黄于欣,刘奕洋. 融入领域术语词典的司法舆情敏感信息识别. 中文信息学报. 2022, 36(9): 76-83,92
ZHANG Zefeng, MAO Cunli, YU Zhengtao, HUANG Yuxin, LIU Yiyang. Sensitive Judicial Public Opinion Information Recognition with the Domain Terminology Dictionary. Journal of Chinese Information Processing. 2022, 36(9): 76-83,92

参考文献

[1] Ding Z Y, Jia Y, Zhou B. Survey of data mining for microblogs[J]. Journal of computer research and development, 2014, 51(4): 691-706.
[2] 赵承鼎,郭军军,余正涛,等.基于非对称孪生网络的新闻与案件相关性分析[J].中文信息学报,2020,34(03): 99-106.
[3] 韩鹏宇,高盛祥,余正涛,等.基于案件要素指导的涉案舆情新闻文本摘要方法[J].中文信息学报,2020,34(05): 56-63.
[4] Chow R,Golle P, Staddon J. Detecting privacy leaks using corpus-based association rules[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008: 893-901.
[5] Cheng X Y, Kang W, Guo Y. An algorithm of network sensitive information features extracting[C]//Applied Mechanics and Materials. Trans Tech Publications Ltd, 2014, 556: 3558-3561.
[6] Xu G, Yu Z, Qi Q. Efficient sensitive information classification and topic tracking based on Tibetan web pages[J]. IEEE Access, 2018, 6: 55643-55652.
[7] Neerbek J, Assent I, Dolog P. Detecting complex sensitive information via phrase structure in recursive neural networks[C]//Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Cham, 2018: 373-385.
[8] Chung J,Gulcehre C, Cho K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[C]//Proceedings of the NIPS Workshop on Deep Learning, 2014.
[9] Xu G, Wu X, Yao H, et al. Research on topic recognition of network sensitive information based on SW-LDA model[J]. IEEE Access, 2019(7): 21527-21538.
[10] Xu G, Yu Z, Chen Z, et al. Sensitive information topics-based sentiment analysis method for big data[J]. IEEE Access, 2019(7): 96177-96190.
[11] Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation[C]//Proceedings of the Advances in Neural Information Processing Systems 14 Vancouver, British Columbia, Canada, 2001.
[12] Zhang S, Zheng D, Hu X, et al. Bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation, 2015: 73-78.
[13] Vaswani A,Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 5998-6008.
[14] Yang M, Chang H, Luo W. Discriminative analysis-synthesis dictionary learning for image classification[J].Neurocomputing, 2017, 219: 404-411.
[15] Mikolov T. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems, 2013, 26: 3111-3119.
[16] Alfonseca E, Pighin D, Garrido G. Heady: News headline abstraction through event pattern clustering[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics.Sofia: ACL, 2013: 1243-1253.
[17] Srivastava N, Geoffrey E H,Krizhevsky A,et al. Dropout: A simple way to prevent neural networks from over fitting[J].Journal of Machine Learning Research, 2015, 15(1): 1929-1958.
[18] Kingma D P,Ba J.Adam: A method for stochastic optimization[C]//Proceedings of the International Conference on Learning Representations,2015: 1-15.
[19] Kowsari K, Jafari M K, Heidarysafa M, et al. Text classification algorithms: A survey[J]. Information, 2019, 10(4): 67-75.
[20] Lever J,Krzywinski M, Altman N. Points of significance: Classification evaluation[J]. Nature methods, 2016, 13(8): 603-604.
[21] Kim Y. Convolutional neural networks for sentence classification[C]//Proceedings of the Conference on Empirical Methods on Natural Language Processing. Doha, Qatar, 2014: 1746-1751.
[22] Zhou P, Shi W, Tian J, et al. Attention-based bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany, 2016: 207-212.
[23] Devlin J, Chang M, Lee K, et al. BERT: Pretraining of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, MN, USA, 2019: 4171-4186.[24] Sun C,Qiu X, Xu Y, et al. How to fine-tune bert for text classification?[C]//Proceedings of China National Conference on Chinese Computational Linguistics. Springer, Cham, 2019: 194-206.
[25] Joulin A, Grave , Bojanowski P, et al. Bag of tricks for efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics 2017: 427-431.
[26] He Y X, Sun S T,Niu F F, et al. A deep learning model enhanced with emotion semantics for microblog sentiment analysis[J]. Chinese Journal of Computers, 2017, 40(4): 773-790.

基金

国家重点研发计划(2018YFC0830105,2018YFC0830101,2018YFC0830100);云南省自然科学基金(2019FA023);云南省中青年学术和技术带头人后备人才项目(2019HB006);云南省高新技术产业专项(201606)
PDF(3283 KB)

1584

Accesses

0

Citation

Detail

段落导航
相关文章

/