融合案件要素的相似案例匹配

刘权,余正涛,高盛祥,何世柱,刘康

PDF(1616 KB)
PDF(1616 KB)
中文信息学报 ›› 2022, Vol. 36 ›› Issue (11) : 140-147.
自然语言理解与生成

融合案件要素的相似案例匹配

  • 刘权1,2,余正涛1,2,高盛祥1,2,何世柱3,刘康3
作者信息 +

Incorporating Case Elements for Case Matching

  • LIU Quan1,2, YU Zhengtao1,2, GAO Shengxiang1,2, HE Shizhu3, LIU Kang3
Author information +
History +

摘要

相似案例匹配是智慧司法中的重要任务,其通过对比两篇案例的语义内容判别二者的相似程度,能够应用于类案检索、类案类判等。相对于普通文本,法律文书不仅篇幅更长,文本之间的区别也更微妙,传统深度匹配模型难以取得理想效果。为了解决上述问题,该文根据文书描写规律截取文书文本,并提出一种融合案件要素的方法来提高相似案件的匹配性能。具体来说,该文以民间借贷案件为应用场景,首先基于法律知识制定了6种民间借贷案件要素,利用正则表达式从法律文书中抽取案件要素,并形成词独热形式的案件要素表征;然后,对法律文本倒序截取,并通过BERT编码得到法律文本表征,解决法律文本的长距离依赖问题;接着使用线性网络融合法律文本表征与案件要素表征,并使用BiLSTM对融合的表征进行高维度化表示;最后通过孪生网络框架构建向量表征相似性矩阵,通过语义交互与向量池化进行最终的相似度判断。实验结果表明,该文模型能有效处理长文本并建模法律文本的细微差异,在CAIL2019-SCM公共数据集上优于基线模型。

Abstract

Simiar Case matching is an important task in intelligent justice, especially for case retrieval and same-case same-judgment. Owing to the long text and the subtle difference between legal documents, existing deep matching models are difficult to achieve ideal results. To address this issue, this paper proposes a method of integrating case elements to improve the matching of similar cases with a focus on the private lending cases. First, six types of private lending case elements are formulated and extracted by regular expressions, represented in the form of one -hot word vectors. Then the legal text is filtered and formed in reverse order, represented by BERT capture the long-distance dependence. The legal text representation and the case element representation is fused by the linear network and then encoded by BiLSTM for high-dimensional representation. Finally, the vector representation similarity matrix is constructed through the twin network framework, and the final similarity is decided by semantic interaction and vector pooling. The experimental results show that the proposed model is better than the baseline model on the CAIL2019-SCM public data set.

关键词

相似案例匹配 / 案件要素 / 预训练语言模型

Key words

case matching / case elements / pre-training language model

引用本文

导出引用
刘权,余正涛,高盛祥,何世柱,刘康. 融合案件要素的相似案例匹配. 中文信息学报. 2022, 36(11): 140-147
LIU Quan, YU Zhengtao, GAO Shengxiang, HE Shizhu, LIU Kang. Incorporating Case Elements for Case Matching. Journal of Chinese Information Processing. 2022, 36(11): 140-147

参考文献

[1] CHEN Q, ZHU X, LING Z, et al. Enhanced LSTM for natural language inference[C]//Proceedings of the Meeting of the Association for Computational Linguistics, 2017: 1657-1668.
[2] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4171-4186.
[3] XIAO C, ZHONG H, GUO Z, et al. Cail2019-SCM: A dataset of similar case matching in legal domain[J]. arXiv preprint arXiv:1911.08962, 2019.
[4] BOWMAN S, ANGELI G, POTTS C, et al. A large annotated corpus for learning natural language inference[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2015: 632-642.
[5] WANG Z, HAMZA W, FLORIAN R. Bilateral multi-perspective Matching for natural language sentences[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017: 4144-4150.
[6] RAJPURKAR P, ZHANG J, LOPYREV K, et al. SQuAD: 100,000+ questions for machine comprehension of text[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2016: 2383-2392.
[7] CONNEAU A, KIELA D, SCHWENK H, et al. Supervised learning of universal sentence representations from natural language inference data[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2017: 670-680.
[8] YIN W, SCHTZE H. Convolutional neural network for paraphrase identification[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015: 901-911.
[9] MUELLER J, THYAGARAJAN A. Siamese recurrent architectures for learning sentence similarity[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence, 2016.
[10] WANG W, YAN M, WU C. Multi-granularity hierarchical attention fusion networks for reading comprehension and question answering[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018: 1705-1714.
[11] PARIKH A, TCKSTRM O, DAS D, et al.A decomposable attention model for natural language inference[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2016: 2249-2255.
[12] GONG Y, LUO H, ZHANG J. Natural language inference over interaction space[C]//Proceedings of International Conference on Learning Representations, 2018: 1-15.
[13] YANG R, ZHANG J, GAO X, et al. Simple and effective text matching with richer alignment features[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 4699-4709.
[14] TAY Y, LUU A T, HUI S C. Co-stack residual affinity networks with multi-level attention refinement for matching text sequences[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 4492-4502.
[15] LIANG D, ZHANG F, ZHANG Q, et al. Asynchronous deep interaction network for natural language inference[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 2692-2700.
[16] Hu Z, Li X, Tu C, et al. Few-shot charge prediction with discriminative legal attributes[C]//Proceedings of the 27th International Conference on Computational Linguistics, 2018: 487-498.
[17] FAWEI B, PAN J Z, KOLLINGBAUM M, et al. A methodology for a criminal law and procedure ontology for legal question answering[C]//Proceedings of the Joint International Semantic Technology Conference. Springer, Cham, 2018: 198-214.
[18] VACEK T, SCHILDER F. A sequence approach to case outcome detection[C]//Proceedings of the 16th Edition of the International Conference on Articial Intelligence and Law, 2017: 209-215.
[19] KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of EMNLP, 2014: 1746-1751.
[20] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.

基金

国家重点研发计划(2018YFC0830101,2018YFC0830105,2018YFC0830100);国家自然科学基金(61972186,61761026,61762056);云南省重大科技专项计划项目(202002AD080001-5);云南省基础研究计划(202001AS070014,2018FB104);云南省高新技术产业专项(201606);云南省人培项目(KKSY201703005)
PDF(1616 KB)

1988

Accesses

0

Citation

Detail

段落导航
相关文章

/