高级检索

面向法律事件检测的大模型协同主动学习框架

Leveraging Large Language Model with Active Learning for Legal Event Detection

  • 摘要: 法律事件检测任务旨在识别法律文本中的事件并对其分类。然而,复杂的法律案件使得收集高质量标注数据面临巨大挑战。目前领域数据标注主要依赖人工,成本高昂且耗时。尽管传统的主动学习能够降低部分标注需求,但仍依赖于人工干预。大模型的发展为自动化数据标注带来了可能性,但如何确保标注的可靠性仍是亟待解决的问题。为此,该文提出了创新的协作训练范式,使用主动学习迭代选择训练数据,并利用大模型生成高质量标注,使用评估筛选机制保留高质量标注,大幅减少了人工标注的工作量。在两个事件检测基准数据集上的实验表明,该方法在低资源场景下显著降低了人工标注需求,在部分情况下可以接近监督学习的性能。

     

    Abstract: Legal event detection aims to identify and categorize events in legal texts. To alleviate the need for high-quality data annotated in the complex legal cases high-quality data, we proposed an innovative, collaborative training paradigm, which iteratively selects informative data using active learning and employs large language models to produce and refine high-quality annotations. An evaluation and filtering mechanism is further introduced to retain only reliable annotations, significantly reducing the need for manual labeling. Extensive experiments on two event detection benchmark datasets demonstrate that our method substantially reduces the demand for manual annotations in low-resource scenarios and, in some instances, achieves performance comparable to supervised learning.

     

/

返回文章
返回