GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs

Duy Nguyen, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal


Abstract
Inference-time steering provides a lightweight alternative to fine-tuning large language models (LLMs) and vision-language models (VLMs) by modifying model activations without updating weights. However, existing methods often rely on a global intervention vector, overlook token-level causal influence, and underutilize model logits, especially in multimodal settings where visual and textual inputs contribute unevenly. We propose GrAInS, a contrastive, gradient-based approach that leverages Integrated Gradients to identify top-k influential tokens and construct directional steering vectors based on their contribution to preferred over dispreferred outputs. These vectors guide activation intervention at each layer, preserving the representational scale. GrAInS outperforms fine-tuning and prior steering methods on both LLM and VLM tasks: improving TruthfulQA accuracy by 13.22% (Llama-3.1-8B), reducing MMHal-Bench hallucinations from 0.624 to 0.514 (LLaVA-1.6-7B), and increasing SPA-VL alignment by 8.11%, all without degrading fluency or general capabilities.
Anthology ID:
2026.acl-long.2159
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
46523–46543
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2159/
DOI:
Bibkey:
Cite (ACL):
Duy Nguyen, Archiki Prasad, Elias Stengel-Eskin, and Mohit Bansal. 2026. GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 46523–46543, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs (Nguyen et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2159.pdf
Checklist:
 2026.acl-long.2159.checklist.pdf