基于预训练语言模型的案件要素识别方法

刘海顺,王雷,孙媛媛,陈彦光,张书晨,林鸿飞

PDF(4524 KB)
PDF(4524 KB)
中文信息学报 ›› 2021, Vol. 35 ›› Issue (11) : 91-100.
信息抽取与文本挖掘

基于预训练语言模型的案件要素识别方法

  • 刘海顺1,王雷2,孙媛媛1,陈彦光1,张书晨1,林鸿飞1
作者信息 +

Case Factor Recognition Based on Pre-trained Language Models

  • LIU Haishun1, WANG Lei2, SUN Yuanyuan1, CHEN Yanguang1, ZHANG Shuchen1, LIN Hongfei1
Author information +
History +

摘要

案件要素识别指将案件描述中重要事实描述自动抽取出来,并根据领域专家设计的要素体系进行分类,是智慧司法领域的重要研究内容。基于传统神经网络的文本编码难以提取深层次特征,基于阈值的多标签分类难以捕获标签间的依赖关系,因此该文提出了基于预训练语言模型的多标签文本分类模型。该模型采用以Layer-attentive策略进行特征融合的语言模型作为编码器,使用基于LSTM的序列生成模型作为解码器。在“CAIL2019”数据集上进行实验,该方法比基于循环神经网络的算法在F1值上平均可提升7.4%,在相同超参数设置下宏平均F1值比基础语言模型(BERT)平均提升3.2%。

Abstract

As an important research issue in legal intelligence, case factor recognition aims to automatically extract the important fact descriptions from the legal case texts and classify them into the factor system designed by the domain experts. Text encoding based on classical neural networks is difficult to extract deep-level features, and threshold based multi-label classification is difficult to capture the dependencies between labels. To deal with this issue, a multi-label text classification model based on pre-trained language model is proposed. The encoder is the language model fine-tuned with the strategy of Layer-attentive, and the decoder is LSTM based sequence generation model. Experiment on the CAIL2019 dataset reveals that the proposed method can improve the F1 score by 7.4% over the Recurrent Neural Network, and 3.2% over the basic language model(BERT) under the same hyper parameter settings.

关键词

案件要素识别 / 多标签文本分类 / 智慧司法

Key words

case factor recognition / multi-label text classification / legal intelligence

引用本文

导出引用
刘海顺,王雷,孙媛媛,陈彦光,张书晨,林鸿飞. 基于预训练语言模型的案件要素识别方法. 中文信息学报. 2021, 35(11): 91-100
LIU Haishun, WANG Lei, SUN Yuanyuan, CHEN Yanguang, ZHANG Shuchen, LIN Hongfei. Case Factor Recognition Based on Pre-trained Language Models. Journal of Chinese Information Processing. 2021, 35(11): 91-100

参考文献

[1] Luo B, Feng Y, Xu J, et al. Learning to predict charges for criminal cases with legal basis[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017: 2727-2736.
[2] Zhong H, Guo Z, Tu C, et al. Legal judgment prediction via topological learning[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 3540-3549.
[3] Hu Z, Li X, Tu C, et al. Few-shot charge prediction with discriminative legal attributes[C]//Proceedings of the 27th International Conference on Computational Linguistics, 2018: 487-498.
[4] 王礼敏. 面向法律文书的中文命名实体识别方法研究[D]. 苏州: 苏州大学硕士学位论文, 2018.
[5] 谢云. 面向中文法律文本的命名实体识别研究[D]. 南京: 南京师范大学硕士学位论文, 2018.
[6] Kim Y. Convolutional neural networks for sentence classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014: 1746-1751.
[7] Yang Z, Yang D, Dyer C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 1480-1489.
[8] Peters M E, Neumann M, iyyer M, et al. Deep contextualized word representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 2227-2237.
[9] Boutell M R, Luo J, Shen X, et al. Learning multi-label scene classification[J]. Pattern recognition, 2004, 37(9): 1757-1771.
[10] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 2227-2237.
[11] Yang P, Sun X, Li W, et al. SGM: Sequence generation model for multi-label classification[C]//Proceedings of the 27th International Conference on Computational Linguistics, 2018: 3915-3926.
[12] Kort F. Predicting supreme court decisions mathematically: A quantitative analysis of the “right to counsel” cases[J]. American Political Science Review 51, 1957(1): 1-12.
[13] Ulmer S S. Quantitative analysis of judicial processes: Some practical and theoretical applications[J]. Law and Contemporary Problems, 1963, 28(1): 64-84.
[14] Shapira M. Computerized decision technology in social service[J]. International Journal of Sociology and Social Policy, 1990, 10: 138-164.
[15] Hassett P. Can expert system technology contribute to improved bail decisions[J]. International Journal of Law and Information Technology, 1993, 1(2): 144.
[16] Aletras N, Tsarapatsanis D, Pietro D P, et al. Predicting judicial decisions of the European court of human rights: A natural language processing perspective[J]. PeerJ Computer Science, 2016, 2: 93.
[17] Sulea O M, Zampieri M, Malmasi S, et al. Exploring the use of text classification in the legal domain[J/OL]. arXiv preprint arXiv: 1710.09306, 2017.
[18] Xiao C, Zhong H, Guo Z, et al. Cail2018: A large-scale legal dataset for judgment prediction[J/OL]. arXiv preprint arXiv: 1807.02478, 2018.
[19] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[J/OL]. arXiv preprint arXiv: 1301.3781, 2013.
[20] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[21] Lin Z, Feng M, Santos C N D, et al. A structured self-attentive sentence embedding[J/OL]. arXiv preprint arXiv: 1703.03130, 2017.
[22] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the NIPS, 2017: 5998-6008.
[23] Yang Z, Dai Z, Yang Y, et al. Xlnet: Generalized autoregressive pretraining for language understanding[C]//Proceedings of the NIPS, 2019: 5754-5764.
[24] Cui Y, Che W, Liu T, et al. Pre-training with whole word masking for Chinese BERT[J/OL]. arXiv preprint arXiv: 1906.08101, 2019.
[25] Liu Y, Ott M, Goyal N, et al. RoBERTa: A robustly optimized BERT pretraining approach[J/OL]. arXiv preprint arXiv: 1907.116921, 2019.
[26] Qiao Y, Xiong C, Liu Z,et al. Understanding the behaviors of BERT in ranking[J/OL]. arXiv preprint arXiv: 1904.07531, 2019.
[27] Sun C, Qiu X, Xu Y, et al. How to fine-tune BERT for text classification[C]//Proceedings of the China National Conference on Chinese Computational Linguistics, 2019: 194-206.
[28] Klambauer G, Unterthiner T, Mayr A, et al. Self-normalizing neural networks[C]//Proceedings of NIPS, 2017: 971-980.
[29] Dai Z, Yang Z, Yang Y, et al. Transformer-XL: Attentive language models beyond a fixed-length context[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 2978-2988.

基金

国家重点研发计划项目(2018YFC0830603)
PDF(4524 KB)

1852

Accesses

0

Citation

Detail

段落导航
相关文章

/