基于预训练语言模型的案件要素识别方法

PDF(4524 KB)

中文信息学报 ›› 2021, Vol. 35 ›› Issue (11) : 91-100.

信息抽取与文本挖掘

基于预训练语言模型的案件要素识别方法

刘海顺¹,王雷²,孙媛媛¹,陈彦光¹,张书晨¹,林鸿飞¹

作者信息 +

Case Factor Recognition Based on Pre-trained Language Models

LIU Haishun¹, WANG Lei², SUN Yuanyuan¹, CHEN Yanguang¹, ZHANG Shuchen¹, LIN Hongfei¹

Author information +

History +

摘要

案件要素识别指将案件描述中重要事实描述自动抽取出来,并根据领域专家设计的要素体系进行分类,是智慧司法领域的重要研究内容。基于传统神经网络的文本编码难以提取深层次特征,基于阈值的多标签分类难以捕获标签间的依赖关系,因此该文提出了基于预训练语言模型的多标签文本分类模型。该模型采用以Layer-attentive策略进行特征融合的语言模型作为编码器,使用基于LSTM的序列生成模型作为解码器。在“CAIL2019”数据集上进行实验,该方法比基于循环神经网络的算法在F₁值上平均可提升7.4%,在相同超参数设置下宏平均F₁值比基础语言模型(BERT)平均提升3.2%。

Abstract

As an important research issue in legal intelligence, case factor recognition aims to automatically extract the important fact descriptions from the legal case texts and classify them into the factor system designed by the domain experts. Text encoding based on classical neural networks is difficult to extract deep-level features, and threshold based multi-label classification is difficult to capture the dependencies between labels. To deal with this issue, a multi-label text classification model based on pre-trained language model is proposed. The encoder is the language model fine-tuned with the strategy of Layer-attentive, and the decoder is LSTM based sequence generation model. Experiment on the CAIL2019 dataset reveals that the proposed method can improve the F₁ score by 7.4% over the Recurrent Neural Network, and 3.2% over the basic language model(BERT) under the same hyper parameter settings.

导出引用

刘海顺,王雷,孙媛媛,陈彦光,张书晨,林鸿飞. 基于预训练语言模型的案件要素识别方法. 中文信息学报. 2021, 35(11): 91-100

LIU Haishun, WANG Lei, SUN Yuanyuan, CHEN Yanguang, ZHANG Shuchen, LIN Hongfei. Case Factor Recognition Based on Pre-trained Language Models. Journal of Chinese Information Processing. 2021, 35(11): 91-100

参考文献

[1] Luo B, Feng Y, Xu J, et al. Learning to predict charges for criminal cases with legal basis[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017: 2727-2736.
[2] Zhong H, Guo Z, Tu C, et al. Legal judgment prediction via topological learning[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 3540-3549.
[3] Hu Z, Li X, Tu C, et al. Few-shot charge prediction with discriminative legal attributes[C]//Proceedings of the 27th International Conference on Computational Linguistics, 2018: 487-498.
[4] 王礼敏. 面向法律文书的中文命名实体识别方法研究[D]. 苏州: 苏州大学硕士学位论文, 2018.
[5] 谢云. 面向中文法律文本的命名实体识别研究[D]. 南京: 南京师范大学硕士学位论文, 2018.
[6] Kim Y. Convolutional neural networks for sentence classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2014: 1746-1751.
[7] Yang Z, Yang D, Dyer C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 1480-1489.
[8] Peters M E, Neumann M, iyyer M, et al. Deep contextualized word representations[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 2227-2237.
[9] Boutell M R, Luo J, Shen X, et al. Learning multi-label scene classification[J]. Pattern recognition, 2004, 37(9): 1757-1771.
[10] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 2227-2237.
[11] Yang P, Sun X, Li W, et al. SGM: Sequence generation model for multi-label classification[C]//Proceedings of the 27th International Conference on Computational Linguistics, 2018: 3915-3926.
[12] Kort F. Predicting supreme court decisions mathematically: A quantitative analysis of the “right to counsel” cases[J]. American Political Science Review 51, 1957(1): 1-12.
[13] Ulmer S S. Quantitative analysis of judicial processes: Some practical and theoretical applications[J]. Law and Contemporary Problems, 1963, 28(1): 64-84.
[14] Shapira M. Computerized decision technology in social service[J]. International Journal of Sociology and Social Policy, 1990, 10: 138-164.
[15] Hassett P. Can expert system technology contribute to improved bail decisions[J]. International Journal of Law and Information Technology, 1993, 1(2): 144.
[16] Aletras N, Tsarapatsanis D, Pietro D P, et al. Predicting judicial decisions of the European court of human rights: A natural language processing perspective[J]. PeerJ Computer Science, 2016, 2: 93.
[17] Sulea O M, Zampieri M, Malmasi S, et al. Exploring the use of text classification in the legal domain[J/OL]. arXiv preprint arXiv: 1710.09306, 2017.
[18] Xiao C, Zhong H, Guo Z, et al. Cail2018: A large-scale legal dataset for judgment prediction[J/OL]. arXiv preprint arXiv: 1807.02478, 2018.
[19] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[J/OL]. arXiv preprint arXiv: 1301.3781, 2013.
[20] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[21] Lin Z, Feng M, Santos C N D, et al. A structured self-attentive sentence embedding[J/OL]. arXiv preprint arXiv: 1703.03130, 2017.
[22] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the NIPS, 2017: 5998-6008.
[23] Yang Z, Dai Z, Yang Y, et al. Xlnet: Generalized autoregressive pretraining for language understanding[C]//Proceedings of the NIPS, 2019: 5754-5764.
[24] Cui Y, Che W, Liu T, et al. Pre-training with whole word masking for Chinese BERT[J/OL]. arXiv preprint arXiv: 1906.08101, 2019.
[25] Liu Y, Ott M, Goyal N, et al. RoBERTa: A robustly optimized BERT pretraining approach[J/OL]. arXiv preprint arXiv: 1907.116921, 2019.
[26] Qiao Y, Xiong C, Liu Z,et al. Understanding the behaviors of BERT in ranking[J/OL]. arXiv preprint arXiv: 1904.07531, 2019.
[27] Sun C, Qiu X, Xu Y, et al. How to fine-tune BERT for text classification[C]//Proceedings of the China National Conference on Chinese Computational Linguistics, 2019: 194-206.
[28] Klambauer G, Unterthiner T, Mayr A, et al. Self-normalizing neural networks[C]//Proceedings of NIPS, 2017: 971-980.
[29] Dai Z, Yang Z, Yang Y, et al. Transformer-XL: Attentive language models beyond a fixed-length context[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 2978-2988.

基金

国家重点研发计划项目(2018YFC0830603)

PDF(4524 KB)

1852

Accesses

Citation

Detail

段落导航

摘要
Abstract
关键词
Key words
引用本文
参考文献
基金

Received	Published
2021-02-21	2021-11-20
Issue Date
2021-11-20

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注

基金