IntroLM: Introspective Language Models via Prefilling-Time Self-Evaluation

Hossein Hosseini Kasnavieh, Gholamreza Haffari, Christopher Leckie, Adel N. Toosi


Abstract
A major challenge for the operation of large language models (LLMs) is how to predict whether a specific LLM will produce sufficiently high-quality output for a given query. Existing approaches rely on external classifiers, most commonly BERT-based models, which suffer from limited context windows, constrained representational capacity, and additional computational overhead. We propose IntroLM, a method that enables causal language models to predict their own output quality during the prefilling phase without affecting generation using [CPX] tokens. By introducing token-conditional LoRA that activates only for the introspective [CPX] token, the model learns to predict the output quality for a given query while preserving the original backbone behavior and avoiding external evaluators. On question-answering benchmarks, IntroLM applied to Qwen3-8B achieves a ROC–AUC of 90% for success prediction, outperforming a DeBERTa-v3-Large classifier by 14%. When integrated into multi-model routing systems, IntroLM achieves superior cost–performance trade-offs, reducing end-to-end latency by up to 33% and large-model usage by up to 50% at matched reliability. Our code is available at https://github.com/hhosseini1377/LLM_routing.
Anthology ID:
2026.findings-acl.598
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12313–12326
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.598/
DOI:
Bibkey:
Cite (ACL):
Hossein Hosseini Kasnavieh, Gholamreza Haffari, Christopher Leckie, and Adel N. Toosi. 2026. IntroLM: Introspective Language Models via Prefilling-Time Self-Evaluation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 12313–12326, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
IntroLM: Introspective Language Models via Prefilling-Time Self-Evaluation (Kasnavieh et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.598.pdf
Checklist:
 2026.findings-acl.598.checklist.pdf