Samira Ebrahimi Kahou
Also published as: Samira Ebrahimi Kahou
2026
Multimodal Large Language Models for Human-AI Interaction: Foundations, Agents, and Inclusive Applications
Shafiq Joty | Enamul Hoque | Ahmed Masry | Spandana Gella | Samira Ebrahimi Kahou
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 6: Tutorial Abstracts)
Shafiq Joty | Enamul Hoque | Ahmed Masry | Spandana Gella | Samira Ebrahimi Kahou
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 6: Tutorial Abstracts)
This tutorial presents foundations, agentic capabilities, and inclusive applications of multimodal large language models, covering architectures, multimodal alignment and reasoning, conversational GUI agents, accessibility, multilingual communication, and responsible deployment.
GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities
Diganta Misra | Nizar Islah | Victor May | Brice Rauby | Zihan Wang | Justine Gehring | Antonio Orvieto | Muawiz Sajjad Chaudhary | Eilif B. Muller | Irina Rish | Samira Ebrahimi Kahou | Massimo Caccia
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Diganta Misra | Nizar Islah | Victor May | Brice Rauby | Zihan Wang | Justine Gehring | Antonio Orvieto | Muawiz Sajjad Chaudhary | Eilif B. Muller | Irina Rish | Samira Ebrahimi Kahou | Massimo Caccia
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The rapid evolution of software libraries poses a considerable hurdle for code generation, necessitating continuous adaptation to frequent version updates while preserving backward compatibility. While existing code evolution benchmarks provide valuable insights, they typically lack execution-based evaluation for generating code compliant with specific library versions. To address this, we introduce GitChameleon 2.0, a novel, meticulously curated dataset comprising 328 Python code completion problems, each conditioned on specific library versions and accompanied by executable unit tests. GitChameleon 2.0 rigorously evaluates the capacity of contemporary large language models (LLMs), LLM-powered agents, code assistants, and RAG systems to perform version-conditioned code generation that demonstrates functional accuracy through execution. Our extensive evaluations indicate that state-of-the-art systems encounter significant challenges with this task; enterprise models achieving baseline success rates in the 48-51% range, underscoring the intricacy of the problem. By offering an execution-based benchmark emphasizing the dynamic nature of code libraries, GitChameleon 2.0 enables a clearer understanding of this challenge and helps guide the development of more adaptable and dependable AI code generation methods.
2024
On the Limits of Multi-modal Meta-Learning with Auxiliary Task Modulation Using Conditional Batch Normalization
Jordi Armengol - Estape | Vincent Michalski | Ramnath Kumar | Pierre - Luc St-Charles | Doina Precup | Samira Ebrahimi Kahou
Proceedings of the Fifth Workshop on Insights from Negative Results in NLP
Jordi Armengol - Estape | Vincent Michalski | Ramnath Kumar | Pierre - Luc St-Charles | Doina Precup | Samira Ebrahimi Kahou
Proceedings of the Fifth Workshop on Insights from Negative Results in NLP
Few-shot learning aims to learn representations that can tackle novel tasks given a small number of examples. Recent studies show that cross-modal learning can improve representations for few-shot classification. More specifically, language is a rich modality that can be used to guide visual learning. In this work, we experiment with a multi-modal architecture for few-shot learning that consists of three components: a classifier, an auxiliary network, and a bridge network. While the classifier performs the main classification task, the auxiliary network learns to predict language representations from the same input, and the bridge network transforms high-level features of the auxiliary network into modulation parameters for layers of the few-shot classifier using conditional batch normalization. The bridge should encourage a form of lightweight semantic alignment between language and vision which could be useful for the classifier. However, after evaluating the proposed approach on two popular few-shot classification benchmarks we find that a) the improvements do not reproduce across benchmarks, and b) when they do, the improvements are due to the additional compute and parameters introduced by the bridge network. We contribute insights and recommendations for future work in multi-modal meta-learning, especially when using language representations.