IntroLM: Introspective Language Models via Prefilling-Time Self-Evaluation
Hossein Hosseini Kasnavieh, Gholamreza Haffari, Christopher Leckie, Adel N. Toosi
Abstract
A major challenge for the operation of large language models (LLMs) is how to predict whether a specific LLM will produce sufficiently high-quality output for a given query. Existing approaches rely on external classifiers, most commonly BERT-based models, which suffer from limited context windows, constrained representational capacity, and additional computational overhead. We propose IntroLM, a method that enables causal language models to predict their own output quality during the prefilling phase without affecting generation using [CPX] tokens. By introducing token-conditional LoRA that activates only for the introspective [CPX] token, the model learns to predict the output quality for a given query while preserving the original backbone behavior and avoiding external evaluators. On question-answering benchmarks, IntroLM applied to Qwen3-8B achieves a ROC–AUC of 90% for success prediction, outperforming a DeBERTa-v3-Large classifier by 14%. When integrated into multi-model routing systems, IntroLM achieves superior cost–performance trade-offs, reducing end-to-end latency by up to 33% and large-model usage by up to 50% at matched reliability. Our code is available at https://github.com/hhosseini1377/LLM_routing.- Anthology ID:
- 2026.findings-acl.598
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 12313–12326
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.598/
- DOI:
- Cite (ACL):
- Hossein Hosseini Kasnavieh, Gholamreza Haffari, Christopher Leckie, and Adel N. Toosi. 2026. IntroLM: Introspective Language Models via Prefilling-Time Self-Evaluation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 12313–12326, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- IntroLM: Introspective Language Models via Prefilling-Time Self-Evaluation (Kasnavieh et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.598.pdf