Improving LLM Domain Certification with Pretrained Guide Models

Jiaqian Zhang, Zhaozhi Qian, Faroq AL-Tam, Ignacio Iacobacci, Muhammad AL-Qurishi, Riad Souissi


Abstract
Large language models (LLMs) often generate off-domain or harmful responses when deployed in specialized, high-stakes domains, motivating the need for rigorous LLM domain certification. While the VALID algorithm (Emde et al., 2025) achieves formal domain certificate guarantee using a guide model G trained from scratch on in-domain data, it suffers from poor generalization due to limited training. In this work, we propose PRISM, a novel approach that overcomes this key limitation by leveraging pretrained language models as guide models, enhanced via contrastive fine-tuning to sharply distinguish acceptable from refused content. We explore and experiment variants of PRISM with different loss functions to ensure that the model exploits the rich world knowledge of pretrained models while aligned to the target domain. We show that two variants of PRISM, PRISM-BC and PRISM-GA, achieve superior OOD rejection and tighter certification bounds across eight diverse data regimes and perturbations, establishing a more reliable approach to domain-adherent LLM deployment.
Anthology ID:
2026.eacl-long.69
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1494–1510
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.69/
DOI:
Bibkey:
Cite (ACL):
Jiaqian Zhang, Zhaozhi Qian, Faroq AL-Tam, Ignacio Iacobacci, Muhammad AL-Qurishi, and Riad Souissi. 2026. Improving LLM Domain Certification with Pretrained Guide Models. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1494–1510, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Improving LLM Domain Certification with Pretrained Guide Models (Zhang et al., EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.69.pdf