Grounded Answers from Multi-Passage Regulations: Learning-to-Rank for Regulatory RAG

Tuba Gokhan, Ted Briscoe


Abstract
Regulatory compliance questions often require aggregating evidence from multiple, interrelated sections of long, complex documents. To support question-answering (QA) in this setting, we introduce ObliQA-MP, a dataset for multi-passage regulatory QA, extending the earlier ObliQA benchmark (CITATION), and improve evidence quality with an LLM–based validation step that filters out ~20% of passages missed by prior natural language inference (NLI) based filtering. Our benchmarks show a notable performance drop from single- to multi-passage retrieval, underscoring the challenges of semantic overlap and structural complexity in regulatory texts. To address this, we propose a feature-based learning-to-rank (LTR) framework that integrates lexical, semantic, and graph-derived information, achieving consistent gains over dense and hybrid baselines. We further add a lightweight score-based filter to trim noisy tails and an obligation-centric prompting technique. On ObliQA-MP, LTR improves retrieval (Recall@10/MAP@10/nDCG@10) over dense, hybrid, and fusion baselines. Our generation approach, based on domain-specific filtering plus prompting, achieves strong scores using the RePAS metric (CITATION) on ObliQA-MP, producing faithful, citation-grounded answers. Together, ObliQA-MP and our validation and RAG systems offer a stronger benchmark and a practical recipe for grounded, citation-controlled QA in regulatory domains.
Anthology ID:
2025.nllp-1.10
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venues:
NLLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
135–146
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.nllp-1.10/
DOI:
Bibkey:
Cite (ACL):
Tuba Gokhan and Ted Briscoe. 2025. Grounded Answers from Multi-Passage Regulations: Learning-to-Rank for Regulatory RAG. In Proceedings of the Natural Legal Language Processing Workshop 2025, pages 135–146, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Grounded Answers from Multi-Passage Regulations: Learning-to-Rank for Regulatory RAG (Gokhan & Briscoe, NLLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.nllp-1.10.pdf