Assaf Arbelle
2026
Activation Reward Models for Few-Shot Model Alignment
Tianning Chai | Chancharik Mitra | Brandon Huang | Gautam Rajendrakumar Gare | Zhiqiu Lin | Assaf Arbelle | Leonid Karlinsky | Rogerio Feris | Trevor Darrell | Deva Ramanan | Roei Herzig
Findings of the Association for Computational Linguistics: ACL 2026
Tianning Chai | Chancharik Mitra | Brandon Huang | Gautam Rajendrakumar Gare | Zhiqiu Lin | Assaf Arbelle | Leonid Karlinsky | Rogerio Feris | Trevor Darrell | Deva Ramanan | Roei Herzig
Findings of the Association for Computational Linguistics: ACL 2026
Aligning Large Language Models (LLMs) and Large Multimodal Models (LMMs) to human preferences is crucial for improving their real-world behavior. A common approach is to use reward models that enable reinforcement-learning post-training. However, traditional reward modeling requires finetuning on large preference datasets, limiting adaptability to new preferences. We introduce Activation Reward Models (Activation RMs)—the first mechanistic interpretability approach that steers LLM activations to align with few-shot preference data without finetuning. Our method combines activation denoising and output token likelihood scoring, achieving state-of-the-art performance on standard reward modeling benchmarks, surpassing zero-shot, few-shot, and voting-based baselines. We further demonstrate that Activation RMs mitigate reward hacking behaviors and remain robust to noisy exemplars and spurious reward signals. To evaluate this, we propose PreferenceHack, a novel few-shot benchmark testing reward models on reward hacking in a paired preference format, where Activation RMs achieve state-of-the-art performance, surpassing GPT-4o.
2024
NumeroLogic: Number Encoding for Enhanced LLMs’ Numerical Reasoning
Eli Schwartz | Leshem Choshen | Joseph Shtok | Sivan Doveh | Leonid Karlinsky | Assaf Arbelle
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Eli Schwartz | Leshem Choshen | Joseph Shtok | Sivan Doveh | Leonid Karlinsky | Assaf Arbelle
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Language models struggle with handling numerical data and performing arithmetic operations. We hypothesize that this limitation can be partially attributed to non-intuitive textual numbers representation. When a digit is read or generated by a causal language model it does not know its place value (e.g. thousands vs. hundreds) until the entire number is processed. To address this issue, we propose a simple adjustment to how numbers are represented by including the count of digits before each number. For instance, instead of “42”, we suggest using “2:42” as the new format. This approach, which we term NumeroLogic, offers an added advantage in number generation by serving as a Chain of Thought (CoT). By requiring the model to consider the number of digits first, it enhances the reasoning process before generating the actual number. We use arithmetic tasks to demonstrate the effectiveness of the NumeroLogic formatting. We further demonstrate NumeroLogic applicability to general natural language modeling, improving language understanding performance in the MMLU benchmark.
2023
Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs
Roei Herzig | Alon Mendelson | Leonid Karlinsky | Assaf Arbelle | Rogerio Feris | Trevor Darrell | Amir Globerson
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Roei Herzig | Alon Mendelson | Leonid Karlinsky | Assaf Arbelle | Rogerio Feris | Trevor Darrell | Amir Globerson
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Vision and language models (VLMs) have demonstrated remarkable zero-shot (ZS) performance in a variety of tasks. However, recent works have shown that even the best VLMs struggle to capture aspects of compositional scene understanding, such as object attributes, relations, and action states. In contrast, obtaining structured annotations, such as scene graphs (SGs), that could improve these models is time-consuming and costly, and thus cannot be used on a large scale. Here we ask whether small SG datasets can provide sufficient information for enhancing structured understanding of pretrained VLMs. We show that it is indeed possible to improve VLMs when learning from SGs by integrating components that incorporate structured information into both visual and textual representations. For the visual side, we incorporate a special “SG Component” in the image transformer trained to predict SG information, while for the textual side, we utilize SGs to generate fine-grained captions that highlight different compositional aspects of the scene. Our method improves the performance of several popular VLMs on multiple VL datasets with only a mild degradation in ZS capabilities.
FlowchartQA: The First Large-Scale Benchmark for Reasoning over Flowcharts
Simon Tannert | Marcelo G. Feighelstein | Jasmina Bogojeska | Joseph Shtok | Assaf Arbelle | Peter W. J. Staar | Anika Schumann | Jonas Kuhn | Leonid Karlinsky
Proceedings of the 1st Workshop on Linguistic Insights from and for Multimodal Language Processing
Simon Tannert | Marcelo G. Feighelstein | Jasmina Bogojeska | Joseph Shtok | Assaf Arbelle | Peter W. J. Staar | Anika Schumann | Jonas Kuhn | Leonid Karlinsky
Proceedings of the 1st Workshop on Linguistic Insights from and for Multimodal Language Processing