MST-R: Multi-Stage Tuning for Retrieval Systems and Metric Evaluation

Yash Malviya; Karan Dhingra; Maneesh Singh

MST-R: Multi-Stage Tuning for Retrieval Systems and Metric Evaluation

Yash Malviya, Karan Dhingra, Maneesh Singh

Abstract

Regulatory documents are rich in nuanced terminology and specialized semantics. FRAG systems: Frozen retrieval-augmented generators utilizing pre-trained (or, frozen) components face consequent challenges with both retriever and answering performance. We present a system that adapts the retriever performance to the target domain using a multi-stage tuning (MST) strategy. Our retrieval approach, called MST-R (a) first fine-tunes encoders used in vector stores using hard negative mining, (b) then uses a hybrid retriever, combining sparse and dense retrievers using reciprocal rank fusion, and then (c) adapts the cross-attention encoder by fine-tuning only the top-k retrieved results. We benchmark the system performance on the dataset released for the RIRAG challenge (as part of the RegNLP workshop at COLING 2025). We achieve significant performance gains obtaining a top rank on the RegNLP challenge leaderboard. We also show that a trivial answering approach *games* the RePASs metric outscoring all baselines and a pre-trained Llama model. Analyzing this anomaly, we present important takeaways for future research. We also release our [code base](https://github.com/Indic-aiDias/MST-R)

Anthology ID:: 2025.regnlp-1.7
Volume:: Proceedings of the 1st Regulatory NLP Workshop (RegNLP 2025)
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Tuba Gokhan, Kexin Wang, Iryna Gurevych, Ted Briscoe
Venues:: RegNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 41–51
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2025.regnlp-1.7/
DOI:
Bibkey:
Cite (ACL):: Yash Malviya, Karan Dhingra, and Maneesh Singh. 2025. MST-R: Multi-Stage Tuning for Retrieval Systems and Metric Evaluation. In Proceedings of the 1st Regulatory NLP Workshop (RegNLP 2025), pages 41–51, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: MST-R: Multi-Stage Tuning for Retrieval Systems and Metric Evaluation (Malviya et al., RegNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2025.regnlp-1.7.pdf

PDF Cite Search Fix data