IntroLM: Introspective Language Models via Prefilling-Time Self-Evaluation

Hossein Hosseini Kasnavieh; Gholamreza Haffari; Christopher Leckie; Adel N. Toosi

IntroLM: Introspective Language Models via Prefilling-Time Self-Evaluation

Hossein Hosseini Kasnavieh, Gholamreza Haffari, Christopher Leckie, Adel N. Toosi

Abstract

A major challenge for the operation of large language models (LLMs) is how to predict whether a specific LLM will produce sufficiently high-quality output for a given query. Existing approaches rely on external classifiers, most commonly BERT-based models, which suffer from limited context windows, constrained representational capacity, and additional computational overhead. We propose IntroLM, a method that enables causal language models to predict their own output quality during the prefilling phase without affecting generation using [CPX] tokens. By introducing token-conditional LoRA that activates only for the introspective [CPX] token, the model learns to predict the output quality for a given query while preserving the original backbone behavior and avoiding external evaluators. On question-answering benchmarks, IntroLM applied to Qwen3-8B achieves a ROC–AUC of 90% for success prediction, outperforming a DeBERTa-v3-Large classifier by 14%. When integrated into multi-model routing systems, IntroLM achieves superior cost–performance trade-offs, reducing end-to-end latency by up to 33% and large-model usage by up to 50% at matched reliability. Our code is available at https://github.com/hhosseini1377/LLM_routing.

Anthology ID:: 2026.findings-acl.598
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12313–12326
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.598/
DOI:
Bibkey:
Cite (ACL):: Hossein Hosseini Kasnavieh, Gholamreza Haffari, Christopher Leckie, and Adel N. Toosi. 2026. IntroLM: Introspective Language Models via Prefilling-Time Self-Evaluation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 12313–12326, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: IntroLM: Introspective Language Models via Prefilling-Time Self-Evaluation (Kasnavieh et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.598.pdf
Checklist:: 2026.findings-acl.598.checklist.pdf

PDF Cite Search Checklist Fix data