MEAV: Model Editing with Alignment Vectors for inference time LLM alignment in single and multidomain preference spectrum

Sadat Shahriar; Zheng Qi; Nikolaos Pappas; Srikanth Doss; Kishaloy Halder; Monica Sunkara; Manuel Mager; Yassine Benajiba

MEAV: Model Editing with Alignment Vectors for inference time LLM alignment in single and multidomain preference spectrum

Sadat Shahriar, Zheng Qi, Nikolaos Pappas, Srikanth Doss, Kishaloy Halder, Monica Sunkara, Manuel Mager, Yassine Benajiba

Abstract

Aligning Large Language Models (LLM) to address subjectivity and nuanced preference levels requires adequate flexibility and control, which can be a resource-intensive and time-consuming procedure. Existing training-time alignment methods require full re-training when a change is needed and inference-time ones typically require access to the reward model at each inference step. We introduce **MEAV**, an inference-time model-editing-based LLM alignment method that learns encoded representations of preference dimensions, called *Alignment Vectors* (AV). These representations enable dynamic adjusting of the model behavior during inference through simple linear operations. Here, we focus on three gradual response levels across three specialized domains: medical, legal, and financial, exemplifying its practical potential. This new alignment paradigm introduces adjustable preference knobs during inference, allowing users to tailor their LLM outputs while reducing the inference cost by half compared to the prompt engineering approach. Additionally, we find that AVs are transferable across different fine-tuning stages of the same model, demonstrating their flexibility. AVs also facilitate multidomain, diverse preference alignment, making the process 12x faster than the retraining approach.

Anthology ID:: 2026.findings-acl.2035
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 40972–40985
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2035/
DOI:
Bibkey:
Cite (ACL):: Sadat Shahriar, Zheng Qi, Nikolaos Pappas, Srikanth Doss, Kishaloy Halder, Monica Sunkara, Manuel Mager, and Yassine Benajiba. 2026. MEAV: Model Editing with Alignment Vectors for inference time LLM alignment in single and multidomain preference spectrum. In Findings of the Association for Computational Linguistics: ACL 2026, pages 40972–40985, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: MEAV: Model Editing with Alignment Vectors for inference time LLM alignment in single and multidomain preference spectrum (Shahriar et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2035.pdf
Checklist:: 2026.findings-acl.2035.checklist.pdf

PDF Cite Search Checklist Fix data