Grounded Answers from Multi-Passage Regulations: Learning-to-Rank for Regulatory RAG

Tuba Gokhan; Ted Briscoe

Grounded Answers from Multi-Passage Regulations: Learning-to-Rank for Regulatory RAG

Abstract

Regulatory compliance questions often require aggregating evidence from multiple, interrelated sections of long, complex documents. To support question-answering (QA) in this setting, we introduce ObliQA-MP, a dataset for multi-passage regulatory QA, extending the earlier ObliQA benchmark (CITATION), and improve evidence quality with an LLM–based validation step that filters out ~20% of passages missed by prior natural language inference (NLI) based filtering. Our benchmarks show a notable performance drop from single- to multi-passage retrieval, underscoring the challenges of semantic overlap and structural complexity in regulatory texts. To address this, we propose a feature-based learning-to-rank (LTR) framework that integrates lexical, semantic, and graph-derived information, achieving consistent gains over dense and hybrid baselines. We further add a lightweight score-based filter to trim noisy tails and an obligation-centric prompting technique. On ObliQA-MP, LTR improves retrieval (Recall@10/MAP@10/nDCG@10) over dense, hybrid, and fusion baselines. Our generation approach, based on domain-specific filtering plus prompting, achieves strong scores using the RePAS metric (CITATION) on ObliQA-MP, producing faithful, citation-grounded answers. Together, ObliQA-MP and our validation and RAG systems offer a stronger benchmark and a practical recipe for grounded, citation-controlled QA in regulatory domains.

Anthology ID:: 2025.nllp-1.10
Volume:: Proceedings of the Natural Legal Language Processing Workshop 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venues:: NLLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 135–146
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.nllp-1.10/
DOI:
Bibkey:
Cite (ACL):: Tuba Gokhan and Ted Briscoe. 2025. Grounded Answers from Multi-Passage Regulations: Learning-to-Rank for Regulatory RAG. In Proceedings of the Natural Legal Language Processing Workshop 2025, pages 135–146, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Grounded Answers from Multi-Passage Regulations: Learning-to-Rank for Regulatory RAG (Gokhan & Briscoe, NLLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.nllp-1.10.pdf

PDF Cite Search Fix data