Shujun Liu
2025
Multi-Agent Simulator Drives Language Models for Legal Intensive Interaction
Shengbin Yue
|
Ting Huang
|
Zheng Jia
|
Siyuan Wang
|
Shujun Liu
|
Yun Song
|
Xuanjing Huang
|
Zhongyu Wei
Findings of the Association for Computational Linguistics: NAACL 2025
Large Language Models (LLMs) have significantly advanced legal intelligence, but the scarcity of scenario data impedes the progress toward interactive legal scenarios. This paper introduces a Multi-agent Legal Simulation Driver (MASER) to scalably generate synthetic data by simulating interactive legal scenarios. Leveraging real-legal case sources, MASER ensures the consistency of legal attributes between participants and introduces a supervisory mechanism to align participants’ characters and behaviors as well as addressing distractions. A Multi-stage Interactive Legal Evaluation (MILE) benchmark is further constructed to evaluate LLMs’ performance in dynamic legal scenarios. Extensive experiments confirm the effectiveness of our framework.
2024
ALaRM: Align Language Models via Hierarchical Rewards Modeling
Yuhang Lai
|
Siyuan Wang
|
Shujun Liu
|
Xuanjing Huang
|
Zhongyu Wei
Findings of the Association for Computational Linguistics: ACL 2024
We introduce ALaRM, the first framework modeling hierarchical rewards in reinforcement learning from human feedback (RLHF), which is designed to enhance the alignment of large language models (LLMs) with human preferences. The framework addresses the limitations of current alignment approaches, which often struggle with the inconsistency and sparsity of human supervision signals, by integrating holistic rewards with aspect-specific rewards. This integration enables more precise and consistent guidance of language models towards desired outcomes, particularly in complex and open text generation tasks. By employing a methodology that filters and combines multiple rewards based on their consistency, the framework provides a reliable mechanism for improving model alignment. We validate our approach through applications in long-form question answering and machine translation tasks, employing gpt-3.5-turbo for pairwise comparisons, and demonstrate improvements over existing baselines. Our work underscores the effectiveness of hierarchical rewards modeling in refining LLM training processes for better human preference alignment. We release our code at https://ALaRM-fdu.github.io.
Search
Fix data
Co-authors
- Xuan-Jing Huang (黄萱菁) 2
- Siyuan Wang (王思远) 2
- Zhongyu Wei (魏忠钰) 2
- Ting Huang 1
- Zheng Jia 1
- show all...