Adel N. Toosi

2026

IntroLM: Introspective Language Models via Prefilling-Time Self-Evaluation
Hossein Hosseini Kasnavieh | Gholamreza Haffari | Christopher Leckie | Adel N. Toosi
Findings of the Association for Computational Linguistics: ACL 2026

A major challenge for the operation of large language models (LLMs) is how to predict whether a specific LLM will produce sufficiently high-quality output for a given query. Existing approaches rely on external classifiers, most commonly BERT-based models, which suffer from limited context windows, constrained representational capacity, and additional computational overhead. We propose IntroLM, a method that enables causal language models to predict their own output quality during the prefilling phase without affecting generation using [CPX] tokens. By introducing token-conditional LoRA that activates only for the introspective [CPX] token, the model learns to predict the output quality for a given query while preserving the original backbone behavior and avoiding external evaluators. On question-answering benchmarks, IntroLM applied to Qwen3-8B achieves a ROC–AUC of 90% for success prediction, outperforming a DeBERTa-v3-Large classifier by 14%. When integrated into multi-model routing systems, IntroLM achieves superior cost–performance trade-offs, reducing end-to-end latency by up to 33% and large-model usage by up to 50% at matched reliability. Our code is available at https://github.com/hhosseini1377/LLM_routing.

Co-authors

Venues

Findings1

Fix author