|
|
Natural Language Understanding for Legal Text: A Review |
AN Zhenwei1, LAI Yuxuan2, FENG Yansong2 |
1.Wangxuan Institute of Computer Technology, Peking University. Beijing 100080, China; 2.Department of Computer Science, The Open University of China. Beijing 100039, China |
|
|
Abstract In recent years, legal artificial intelligence has attracted increasing attention for its efficiency and convenience. Among others, legal text is the most common manifestation in legal practice, thus, using natural language understanding method to automatically process legal text is an important direction for both academia and industry. In this paper, we provide a gentle survey to summarize recent advances on natural language understanding for legal texts. We first introduce the popular task setups, including legal information extraction, legal case retrieval, legal question answering, legal text summarization, and legal judgement prediction. We further discuss the main challenges from three perspectives: understanding the difference of languages between legal domain and open domain, understanding the rich argumentative texts in legal documents, and incorporating legal knowledge into existing natural language processing models.
|
Received: 16 August 2021
|
|
|
|
|
[1] 最高人民法院. 《2020年最高人民法院工作报告》[R],2021. [2] Hu Z, Li X, Tu C, et al.Few-shot charge prediction with discriminative legal attributes[C]//Proceedings of the 27th International Conference on Computational Linguistics, 2018: 487-498. [3] Zhong H, Wang Y, Tu C, et al. Iteratively questioning and answering for interpretable legal judgment prediction[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 1250-1257. [4] Bruckschen M, Northfleet C, Silva D, et al. Named entity recognition in the legal domain for ontology population[C]//Proceedings of the 3rd Workshop on Semantic Processing of Legal Texts, 2010: 16. [5] Zhang N, Pu Y, Yang S, et al. An ontological Chinese legal consultation system [J]. IEEE Access, 2017, 5: 18250-18261. [6] Cardellino C, Teruel M, Alemany L, et al. Legal NERC with ontologies, Wikipedia and curriculum learning[C]//Proceedings of the 15th European Chapter of the Association for Computational Linguistics, 2017: 254-259. [7] Wang Z, Wu Y, Lei P, et al. Named entity recognition method of brazilian legal text based on pre-training model [J]. Journal of Physics: Conference Series, 2020, 1550: 032149. [8] Christopher W, Strassel S, Medero J, et al. ACE multilingual training corpus [J]. Linguistic Data Consortium, 2006, 57: 45. [9] Shen S, Qi G, Li Z, et al. Hierarchical Chinese legal event extraction via pedal attention mechanism[C]//Proceedings of the 28th International Conference on Computational Linguistics, 2020: 100-113. [10] Chalkidis I, Androutsopoulos I, Michos A. Extracting contract elements[C]//Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law, 2017: 19-28. [11] Chalkidis I, Androutsopoulos I. A deep learning approach to contract element extraction [C]//Proceedings of 30th International Conference on Legal Knowledge and Information Systems, 2017: 155-164. [12] Wang Z, Song H, Ren Z, et al. Cross-domain contract element extraction with a bi-directional feedback clause-element relation network[C]//Proceedings of the 44th International ACM SIGIR Conference, 2021: 1003-1012. [13] 张明楷. 刑法学[M]. 北京: 法律出版社,2011: 117-120. [14] Li L, Zhao L, Nai P, et al. Charge prediction modeling with interpretation enhancement driven by double-layer criminal system[C]//Proceedings of World Wide Web, 2021: 1-20. [15] Li J, Zhang G, Yan H, et al. A markov logic networks based method to predict judicial decisions of divorce cases[C]//Proceedings of IEEE International Conference on Smart Cloud, 2018: 129-132. [16] Shu Y, Zhao Y, Zeng X, et al.Cail2019-fe[R], 2019. [17] Zhong Haoxi, Xiao Chaojun, Tu Cunchao, et al. How does NLP benefit legal system: A summary of legal artificial intelligence[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 5218-5230. [18] Kim Y. Convolutional neural networks for sentence classification[C]//Proceedings of of the Conference on Empirical Methods in Natural Language Processing, 2014: 1746-1751. [19] Johnson R, Zhang T. Deep pyramid convolutional neural networks for text categorization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017: 562-570. [20] Seo M, Kembhavi A, Farhadi A, et al.Bidirectional attention flow for machine comprehension[C]//Proceddings of the 5th International Conference on Learning Representations, 2017. [21] Shao Y, Wu Y, Liu Y, et al. Investigating user behavior in legal case retrieval[C]//Proceedings of the 44th International ACM SIGIR Conference, 2021: 962-972. [22] Opijnen M, Santos C. On the concept of relevance in legal information retrieval [J]. Artificial Intelligence and Law, 2017, 25: 65-87. [23] Ma Y X, Shao Y Q, Wu Y Y, et al. LeCaRD: A legal case retrieval dataset for Chinese law system[C]//Proceedings of the 44th International ACM SIGIR Conference, 2021: 2342-2348. [24] Shao Y, Mao J, Liu Y, et al. BERT-PLI: Modeling paragraph-level interactions for legal case retrieval[C]//Proceedings of International Joint Conference on Artificial Intelligence, 2020: 3501-3507. [25] Locke D, Guido Z. A test collection for evaluating legal case law search[C]//Proceedings of the 41st International ACM SIGIR Conference, 2018: 1261-1264. [26] Rabelo J, Kim M Y, Goebel R, et al. COLIEE 2020: Methods for legal document retrieval and entailment[C]//Proceedings of JSAI International Symposium on Artificial Intelligence, 2020: 196-210. [27] Robertson S E, Walker S.Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval[C]//Proceedings of the 17th International ACM SIGIR Conference, 1994: 232-241. [28] Salton G, Buckley C. Term-weighting approaches in automatic text retrieval[J]. Information Processing & Management, 1988, 24(5): 513-523. [29] Tran V, Nguyen M, Satoh K. Building legal case retrieval systems with lexical matching and summarization using a pre-trained phrase scoring model[C]//Proceedings of the 17th International Conference on Artificial Intelligence and Law, 2019: 275-282. [30] Sugathadasa K, Ayesha B, Silva N, et al. Legal document retrieval using document vector embeddings and deep learning[C]//Processdings of Science and information conference, 2018: 160-175. [31] Liu B, Wu Y, Liu Y, et al. Conversational vs traditional: Comparing search behavior and outcome in legal case retrieval[C]//Proceedings of the 44th International ACM SIGIR Conference, 2021: 1622-1626. [32] Ravichander A, Black A, Wilson S, et al. Question answering for privacy policies: Combining computational and legal perspectives[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2019: 4946-4957. [33] Kim M, Goebel R, Ken S. COLIEE-2015: Evaluation of legal question answering[C]//Proceedings of 9th International Workshop on Juris-informatics, 2015. [34] Zhong H, Xiao C, Tu C, et al. Jec-qa: A legal-domain question answering dataset[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2020: 9701-9708. [35] Carvalho, Danilo S, Nguyen M, et al. Lexical-morphological modeling for legal text analysis[C]//Proceedings of JSAI International Symposium on Artificial Intelligence, 2015: 295-311. [36] Do P, Nguyen H, Tran C, et al. Legal question answering using ranking SVM and deep convolutional neural network [J]. arXiv preprint arXiv, 2017, 1703: 05320. [37] Kien P M, Nguyen H T, Bach N X, et al. Answering legal questions by learning neural attentive text representation[C]//Proceedings of the 28th International Conference on Computational Linguistics, 2020: 988-998. [38] Kim M, Xu Y, Goebel R, et al. Answering yes/no questions in legal bar exams[C]//Proceedings of JSAI International Symposium on Artificial Intelligence, 2013: 199-213. [39] Kim M, Xu Y, Goebel R. A convolutional neural network in legal question answering[C]//Proceedings of 9th International Workshop on Juris-informatics, 2015. [40] Kim M, Goebel R. Two-step cascaded textual entailment for legal bar exam question answering[C]//Proceedings of the 16th Edition of the International Conference on Articial Intelligence and Law, 2017: 283-290. [41] Fawei B, Pan J, Kollingbaum M, et al. A methodology for a criminal law and procedure ontology for legal question answering[C]//Proceedings of the Joint International Semantic Technology Conference, 2018: 198-214. [42] Grover C, Hachey B, Hughson I, et al. Automatic summarisation of legal documents[C]//Proceedings of the 9th International Conference on Artificial Intelligence and Law, 2003: 243-251. [43] Farzindar A, Guy L. Legal text summarization by exploration of the thematic structure and argumentative roles[C]//Proceedings of the ACL Text Summarization Branches Out Workshop, 2004: 27-34. [44] Wagh R, Deepa A. A novel approach of augmenting training data for legal text segmentation by leveraging domain knowledge[C]//Proceedings of the 4th International Symposium on Intelligent Systems, Technologies and Applications, Advances in Intelligent Systems and Computing,2020: 53-63. [45] Bhattacharya P, Poddar S, Rudra K, et al. Incorporating domain knowledge for extractive summarization of legal case documents[C]//Proceedings of the 18th International Conference on Artificial Intelligence and Law, 2021: 22-31. [46] Duan X, Zhang Y, Yuan L, et al. Legal summarization for multi-role debate dialogue via controversy focus mining and multi-task learning[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019: 1361-1370. [47] Liu C, Chang C, Ho J. Case instance generation and refinement for case-based criminal summary judgments in Chinese [J]. Journal of Information Science and Engineering, 2004, 20(4): 783-800. [48] Luo B, Feng Y, Xu J, et al. Learning to predict charges for criminal cases with legal basis[C]//Proceedings of of the Conference on Empirical Methods in Natural Language Processing, 2017: 2727-2736. [49] Chen H, Cai D, Dai W, et al.Charge-based prison term prediction with deep gating network[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2019: 6363-6368. [50] Zhong H, Guo Z, Tu C, et al. Legal judgment prediction via topological learning[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018: 3540-3549. [51] Yang W, Jia W, Zhou X, et al. Legal judgment prediction via multi-perspective bi-feedback network[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019: 4085-4091. [52] Yue L, Liu Q, Jin B, et al.NeurJudge: A circumstance-aware neural framework for legal judgment prediction[C]//Proceedings of the 44th International ACM SIGIR Conference, 2021: 973-982. [53] Long S, Tu C, Liu Z, et al. Automatic judgment prediction via legal reading comprehension[C]//Proceedings of China National Conference on Chinese Computational Linguistics, 2019: 558-572. [54] Aletras N, Tsarapatsanis D, Preotiuc Pietro D, et al. Predicting judicial decisions of the European court of human rights: A natural language processing perspective[J]. PeerJ Computer Science, 2016, 2: e93 [55] Katz D,Bommarito M, Blackman J. A general approach for predicting the behavior of the supreme court of the united States[J].PLoS One, 2017, 12: e0174698. [56] Liu X, Mou L, Cui H, et al. Jumper: Learning when to make classification decisions in reading[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2018: 4237-4243. [57] Jiang X, Ye H, Luo Z, et al. Interpretable rationale augmented charge prediction system[C]//Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, 2018: 146-151. [58] He C, Li P, Le Y, et al. SECaps: A sequence enhanced capsule model for charge prediction[C]//Proceedings of International Conference on Artificial Neural Networks, 2019: 227-239. [59] Chalkidis I, Fergadiotis T M, Malakasiotis P, et al. LEGAL-BERT: The muppets straight out of law school[G]. Findings of the Association for Computational Linguistics: EMNLP, 2020: 2898-2904. [60] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4171-4186. [61] Xiao C, Hu X, Liu Z, et al. Lawformer: A pre-trained language model for Chinese legal long documents[J]. AI Open, 2021,6: 9-3. [62] Beltagy I, Peters M E, Cohan A. Longformer: The long-document transformer[J]. arXiv preprint arXiv, 2020, 2004: 05150 [63] Liu Y, Ott M, Goyal N, et al. Roberta: A robustly optimized bert pretraining approach[J]. arXiv preprint arXiv, 2019, 1907: 11692 [64] Doshi-Velez F, Kortz M, Budish T R, et al.Accountability of AI under the law: The role of explanation[J]. arXiv preprint arXiv, 2017, 1711: 01134. [65] Ma L, Zhang Y, Wang T, et al. Legal judgment prediction with multi-stage case representation learning in the real court setting[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval,2021: 993-1002. [66] Zhong H, Zhou J, Qu W, et al. An element-aware multi-representation model for law article prediction[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing,2020: 6663-6668. [67] Wang P, Fan Y, Niu S, et al. Hierarchical matching network for crime classification[C]//Proceedings of the 42th International ACM SIGIR Conference, 2019: 325-334. [68] Xu N, Wang P, Chen L, et al. Distinguish confusing law articles for legal judgment prediction[C]//Proceedings of the 28th International Conference on Computational Linguistics, 2020: 3086-3095. |
|
|
|