Harnessing the Latent Space: From Steering Vectors to Model Calibrators for Control and Trust

Nishant Subramani


Abstract
Language models have changed from unreliable text generators to highly-capable large models with trillions of parameters. Capability increases come hand-in-hand with increases in scale, making understanding the internal representations of models more challenging. Since millions of users increasing rely on language models to interact with external tools or make decisions in medium or high-stakes scenarios, we need to establish control over model behavior and know when to trust model outputs. In this paper, we discuss our contributions on harnessing the latent spaces by proposing steering vectors for control and developing latent space-based model calibrators for trust. Together, our contributions help demystify the latent spaces of language models and offer new insights into how to harness model internals to build more trustworthy language technology.
Anthology ID:
2026.bigpicture-main.10
Volume:
Proceedings of The Big Picture v2: Crafting a Research Narrative
Month:
July
Year:
2026
Address:
San Diego, CA, USA
Editors:
Yanai Elazar, Allyson Ettinger, Nora Kassner, Sebastian Ruder
Venues:
BigPicture | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
119–130
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bigpicture-main.10/
DOI:
Bibkey:
Cite (ACL):
Nishant Subramani. 2026. Harnessing the Latent Space: From Steering Vectors to Model Calibrators for Control and Trust. In Proceedings of The Big Picture v2: Crafting a Research Narrative, pages 119–130, San Diego, CA, USA. Association for Computational Linguistics.
Cite (Informal):
Harnessing the Latent Space: From Steering Vectors to Model Calibrators for Control and Trust (Subramani, BigPicture 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bigpicture-main.10.pdf