面向法律文书的自然语言理解

PDF(1334 KB)

中文信息学报 ›› 2022, Vol. 36 ›› Issue (8) : 1-11.

综述

面向法律文书的自然语言理解

安震威¹,来雨轩²,冯岩松¹

作者信息 +

Natural Language Understanding for Legal Text: A Review

AN Zhenwei¹, LAI Yuxuan², FENG Yansong²

Author information +

History +

摘要

法律人工智能因其高效、便捷的特点,近年来受到社会各界的广泛关注。法律文书是法律在社会生活中最常见的表现形式,应用自然语言理解方法智能地处理法律文书内容是一个重要的研究和应用方向。该文梳理与总结面向法律文书的自然语言理解技术,首先介绍了五类面向法律文书的自然语言理解任务形式: 法律文书信息提取、类案检索、司法问答、法律文书摘要和判决预测。然后,该文探讨了运用现有自然语言理解技术应对法律文书理解的主要挑战,指出需要解决好法律文书与日常生活语言之间的表述差异性、建模好法律文书中特有的推理与论辩结构,并且需要将法条、推理模式等法律知识融入自然语言理解模型。

Abstract

In recent years, legal artificial intelligence has attracted increasing attention for its efficiency and convenience. Among others, legal text is the most common manifestation in legal practice, thus, using natural language understanding method to automatically process legal text is an important direction for both academia and industry. In this paper, we provide a gentle survey to summarize recent advances on natural language understanding for legal texts. We first introduce the popular task setups, including legal information extraction, legal case retrieval, legal question answering, legal text summarization, and legal judgement prediction. We further discuss the main challenges from three perspectives: understanding the difference of languages between legal domain and open domain, understanding the rich argumentative texts in legal documents, and incorporating legal knowledge into existing natural language processing models.

导出引用

安震威,来雨轩,冯岩松. 面向法律文书的自然语言理解. 中文信息学报. 2022, 36(8): 1-11

AN Zhenwei, LAI Yuxuan, FENG Yansong. Natural Language Understanding for Legal Text: A Review. Journal of Chinese Information Processing. 2022, 36(8): 1-11

参考文献

[1] 最高人民法院. 《2020年最高人民法院工作报告》[R],2021.
[2] Hu Z, Li X, Tu C, et al.Few-shot charge prediction with discriminative legal attributes[C]//Proceedings of the 27th International Conference on Computational Linguistics, 2018: 487-498.
[3] Zhong H, Wang Y, Tu C, et al. Iteratively questioning and answering for interpretable legal judgment prediction[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 1250-1257.
[4] Bruckschen M, Northfleet C, Silva D, et al. Named entity recognition in the legal domain for ontology population[C]//Proceedings of the 3rd Workshop on Semantic Processing of Legal Texts, 2010: 16.
[5] Zhang N, Pu Y, Yang S, et al. An ontological Chinese legal consultation system [J]. IEEE Access, 2017, 5: 18250-18261.
[6] Cardellino C, Teruel M, Alemany L, et al. Legal NERC with ontologies, Wikipedia and curriculum learning[C]//Proceedings of the 15th European Chapter of the Association for Computational Linguistics, 2017: 254-259.
[7] Wang Z, Wu Y, Lei P, et al. Named entity recognition method of brazilian legal text based on pre-training model [J]. Journal of Physics: Conference Series, 2020, 1550: 032149.
[8] Christopher W, Strassel S, Medero J, et al. ACE multilingual training corpus [J]. Linguistic Data Consortium, 2006, 57: 45.
[9] Shen S, Qi G, Li Z, et al. Hierarchical Chinese legal event extraction via pedal attention mechanism[C]//Proceedings of the 28th International Conference on Computational Linguistics, 2020: 100-113.
[10] Chalkidis I, Androutsopoulos I, Michos A. Extracting contract elements[C]//Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law, 2017: 19-28.
[11] Chalkidis I, Androutsopoulos I. A deep learning approach to contract element extraction [C]//Proceedings of 30th International Conference on Legal Knowledge and Information Systems, 2017: 155-164.
[12] Wang Z, Song H, Ren Z, et al. Cross-domain contract element extraction with a bi-directional feedback clause-element relation network[C]//Proceedings of the 44th International ACM SIGIR Conference, 2021: 1003-1012.
[13] 张明楷. 刑法学[M]. 北京: 法律出版社,2011: 117-120.
[14] Li L, Zhao L, Nai P, et al. Charge prediction modeling with interpretation enhancement driven by double-layer criminal system[C]//Proceedings of World Wide Web, 2021: 1-20.
[15] Li J, Zhang G, Yan H, et al. A markov logic networks based method to predict judicial decisions of divorce cases[C]//Proceedings of IEEE International Conference on Smart Cloud, 2018: 129-132.
[16] Shu Y, Zhao Y, Zeng X, et al.Cail2019-fe[R], 2019.
[17] Zhong Haoxi, Xiao Chaojun, Tu Cunchao, et al. How does NLP benefit legal system: A summary of legal artificial intelligence[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 5218-5230.
[18] Kim Y. Convolutional neural networks for sentence classification[C]//Proceedings of of the Conference on Empirical Methods in Natural Language Processing, 2014: 1746-1751.
[19] Johnson R, Zhang T. Deep pyramid convolutional neural networks for text categorization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 562-570.
[20] Seo M, Kembhavi A, Farhadi A, et al.Bidirectional attention flow for machine comprehension[C]//Proceddings of the 5th International Conference on Learning Representations, 2017.
[21] Shao Y, Wu Y, Liu Y, et al. Investigating user behavior in legal case retrieval[C]//Proceedings of the 44th International ACM SIGIR Conference, 2021: 962-972.
[22] Opijnen M, Santos C. On the concept of relevance in legal information retrieval [J]. Artificial Intelligence and Law, 2017, 25: 65-87.
[23] Ma Y X, Shao Y Q, Wu Y Y, et al. LeCaRD: A legal case retrieval dataset for Chinese law system[C]//Proceedings of the 44th International ACM SIGIR Conference, 2021: 2342-2348.
[24] Shao Y, Mao J, Liu Y, et al. BERT-PLI: Modeling paragraph-level interactions for legal case retrieval[C]//Proceedings of International Joint Conference on Artificial Intelligence, 2020: 3501-3507.
[25] Locke D, Guido Z. A test collection for evaluating legal case law search[C]//Proceedings of the 41st International ACM SIGIR Conference, 2018: 1261-1264.
[26] Rabelo J, Kim M Y, Goebel R, et al. COLIEE 2020: Methods for legal document retrieval and entailment[C]//Proceedings of JSAI International Symposium on Artificial Intelligence, 2020: 196-210.
[27] Robertson S E, Walker S.Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval[C]//Proceedings of the 17th International ACM SIGIR Conference, 1994: 232-241.
[28] Salton G, Buckley C. Term-weighting approaches in automatic text retrieval[J]. Information Processing & Management, 1988, 24(5): 513-523.
[29] Tran V, Nguyen M, Satoh K. Building legal case retrieval systems with lexical matching and summarization using a pre-trained phrase scoring model[C]//Proceedings of the 17th International Conference on Artificial Intelligence and Law, 2019: 275-282.
[30] Sugathadasa K, Ayesha B, Silva N, et al. Legal document retrieval using document vector embeddings and deep learning[C]//Processdings of Science and information conference, 2018: 160-175.
[31] Liu B, Wu Y, Liu Y, et al. Conversational vs traditional: Comparing search behavior and outcome in legal case retrieval[C]//Proceedings of the 44th International ACM SIGIR Conference, 2021: 1622-1626.
[32] Ravichander A, Black A, Wilson S, et al. Question answering for privacy policies: Combining computational and legal perspectives[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2019: 4946-4957.
[33] Kim M, Goebel R, Ken S. COLIEE-2015: Evaluation of legal question answering[C]//Proceedings of 9th International Workshop on Juris-informatics, 2015.
[34] Zhong H, Xiao C, Tu C, et al. Jec-qa: A legal-domain question answering dataset[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 9701-9708.
[35] Carvalho, Danilo S, Nguyen M, et al. Lexical-morphological modeling for legal text analysis[C]//Proceedings of JSAI International Symposium on Artificial Intelligence, 2015: 295-311.
[36] Do P, Nguyen H, Tran C, et al. Legal question answering using ranking SVM and deep convolutional neural network [J]. arXiv preprint arXiv, 2017, 1703: 05320.
[37] Kien P M, Nguyen H T, Bach N X, et al. Answering legal questions by learning neural attentive text representation[C]//Proceedings of the 28th International Conference on Computational Linguistics, 2020: 988-998.
[38] Kim M, Xu Y, Goebel R, et al. Answering yes/no questions in legal bar exams[C]//Proceedings of JSAI International Symposium on Artificial Intelligence, 2013: 199-213.
[39] Kim M, Xu Y, Goebel R. A convolutional neural network in legal question answering[C]//Proceedings of 9th International Workshop on Juris-informatics, 2015.
[40] Kim M, Goebel R. Two-step cascaded textual entailment for legal bar exam question answering[C]//Proceedings of the 16th Edition of the International Conference on Articial Intelligence and Law, 2017: 283-290.
[41] Fawei B, Pan J, Kollingbaum M, et al. A methodology for a criminal law and procedure ontology for legal question answering[C]//Proceedings of the Joint International Semantic Technology Conference, 2018: 198-214.
[42] Grover C, Hachey B, Hughson I, et al. Automatic summarisation of legal documents[C]//Proceedings of the 9th International Conference on Artificial Intelligence and Law, 2003: 243-251.
[43] Farzindar A, Guy L. Legal text summarization by exploration of the thematic structure and argumentative roles[C]//Proceedings of the ACL Text Summarization Branches Out Workshop, 2004: 27-34.
[44] Wagh R, Deepa A. A novel approach of augmenting training data for legal text segmentation by leveraging domain knowledge[C]//Proceedings of the 4th International Symposium on Intelligent Systems, Technologies and Applications, Advances in Intelligent Systems and Computing,2020: 53-63.
[45] Bhattacharya P, Poddar S, Rudra K, et al. Incorporating domain knowledge for extractive summarization of legal case documents[C]//Proceedings of the 18th International Conference on Artificial Intelligence and Law, 2021: 22-31.
[46] Duan X, Zhang Y, Yuan L, et al. Legal summarization for multi-role debate dialogue via controversy focus mining and multi-task learning[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019: 1361-1370.
[47] Liu C, Chang C, Ho J. Case instance generation and refinement for case-based criminal summary judgments in Chinese [J]. Journal of Information Science and Engineering, 2004, 20(4): 783-800.
[48] Luo B, Feng Y, Xu J, et al. Learning to predict charges for criminal cases with legal basis[C]//Proceedings of of the Conference on Empirical Methods in Natural Language Processing, 2017: 2727-2736.
[49] Chen H, Cai D, Dai W, et al.Charge-based prison term prediction with deep gating network[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2019: 6363-6368.
[50] Zhong H, Guo Z, Tu C, et al. Legal judgment prediction via topological learning[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 3540-3549.
[51] Yang W, Jia W, Zhou X, et al. Legal judgment prediction via multi-perspective bi-feedback network[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019: 4085-4091.
[52] Yue L, Liu Q, Jin B, et al.NeurJudge: A circumstance-aware neural framework for legal judgment prediction[C]//Proceedings of the 44th International ACM SIGIR Conference, 2021: 973-982.
[53] Long S, Tu C, Liu Z, et al. Automatic judgment prediction via legal reading comprehension[C]//Proceedings of China National Conference on Chinese Computational Linguistics, 2019: 558-572.
[54] Aletras N, Tsarapatsanis D, Preotiuc Pietro D, et al. Predicting judicial decisions of the European court of human rights: A natural language processing perspective[J]. PeerJ Computer Science, 2016, 2: e93
[55] Katz D,Bommarito M, Blackman J. A general approach for predicting the behavior of the supreme court of the united States[J].PLoS One, 2017, 12: e0174698.
[56] Liu X, Mou L, Cui H, et al. Jumper: Learning when to make classification decisions in reading[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2018: 4237-4243.
[57] Jiang X, Ye H, Luo Z, et al. Interpretable rationale augmented charge prediction system[C]//Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, 2018: 146-151.
[58] He C, Li P, Le Y, et al. SECaps: A sequence enhanced capsule model for charge prediction[C]//Proceedings of International Conference on Artificial Neural Networks, 2019: 227-239.
[59] Chalkidis I, Fergadiotis T M, Malakasiotis P, et al. LEGAL-BERT: The muppets straight out of law school[G]. Findings of the Association for Computational Linguistics: EMNLP, 2020: 2898-2904.
[60] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4171-4186.
[61] Xiao C, Hu X, Liu Z, et al. Lawformer: A pre-trained language model for Chinese legal long documents[J]. AI Open, 2021,6: 9-3.
[62] Beltagy I, Peters M E, Cohan A. Longformer: The long-document transformer[J]. arXiv preprint arXiv, 2020, 2004: 05150
[63] Liu Y, Ott M, Goyal N, et al. Roberta: A robustly optimized bert pretraining approach[J]. arXiv preprint arXiv, 2019, 1907: 11692
[64] Doshi-Velez F, Kortz M, Budish T R, et al.Accountability of AI under the law: The role of explanation[J]. arXiv preprint arXiv, 2017, 1711: 01134.
[65] Ma L, Zhang Y, Wang T, et al. Legal judgment prediction with multi-stage case representation learning in the real court setting[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval,2021: 993-1002.
[66] Zhong H, Zhou J, Qu W, et al. An element-aware multi-representation model for law article prediction[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2020: 6663-6668.
[67] Wang P, Fan Y, Niu S, et al. Hierarchical matching network for crime classification[C]//Proceedings of the 42th International ACM SIGIR Conference, 2019: 325-334.
[68] Xu N, Wang P, Chen L, et al. Distinguish confusing law articles for legal judgment prediction[C]//Proceedings of the 28th International Conference on Computational Linguistics, 2020: 3086-3095.

基金

科技部重点研发计划项目(2018YFC0931906)

PDF(1334 KB)

3379

Accesses

Citation

Detail

段落导航

摘要
Abstract
关键词
Key words
引用本文
参考文献
基金

Received	Published
2021-08-16	2022-09-26
Issue Date
2022-09-26

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注

基金