Angelika Romanou
2026
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
Alejandro Hernández-Cano | Alexander Hägele | Allen Hao Huang | Angelika Romanou | Antoni-Joan Solergibert | Barna Pásztor | Bettina Messmer | Dhia Garbaya | Eduard Frank Ďurech | Ido Hakimi | Juan Garcia Giraldo | Mete Ismayilzada | Negar Foroutan | Skander Moalla | Tiancheng Chen | Vinko Sabolčec | Yixuan Xu | Michael Aerni | Badr AlKhamissi | Inés Altemir Marinas | Mohammad Hossein Amani | Matin Ansaripour | Ilia Badanin | Harold Benoit | Emanuela Boros | Nicholas John Browning | Fabian Bösch | Maximilian Böther | Niklas Canova | Camille Challier | Clément Charmillot | Jonathan Coles | Jan Milan Deriu | Arnout Devos | Lukas Drescher | Daniil Dzenhaliou | Maud Ehrmann | Dongyang Fan | Simin Fan | Silin Gao | Miguel Gila | María Grandury | Diba Hashemi | Alexander Miserlis Hoyle | Jiaming Jiang | Mark Klein | Andrei Kucharavy | Anastasiia Kucherenko | Frederike Lübeck | Roman Machacek | Theofilos Ioannis Manitaras | Andreas Marfurt | Kyle Matoba | Simon Matrenok | Henrique Mendonça | Fawzi Roberto Mohamed | Syrielle Montariol | Luca Mouchel | Sven Najem-Meyer | Jingwei Ni | Gennaro Oliva | Matteo Pagliardini | Elia Palme | Andrei Panferov | Léo Paoletti | Marco Passerini | Ivan Pavlov | Auguste Poiroux | Kaustubh Ponkshe | Nathan Ranchin | Javier Rando | Mathieu Sauser | Jakhongir Saydaliev | Mukhammadali Sayfiddinov | Marian Schneider | Stefano Schuppli | Marco Scialanga | Andrei Semenov | Kumar Shridhar | Raghav Singhal | Anna Sotnikova | Alexander Sternfeld | Ayush Kumar Tarun | Paul Teiletche | Jannis Vamvas | Xiaozhe Yao | Hao Zhao | Alexander Ilic | Ana Klimovic | Andreas Krause | Caglar Gulcehre | David Rosenthal | Elliott Ash | Florian Tramèr | Joost VandeVondele | Livio Veraldi | Martin Rajman | Thomas C. Schulthess | Torsten Hoefler | Antoine Bosselut | Martin Jaggi | Imanol Schlag
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Alejandro Hernández-Cano | Alexander Hägele | Allen Hao Huang | Angelika Romanou | Antoni-Joan Solergibert | Barna Pásztor | Bettina Messmer | Dhia Garbaya | Eduard Frank Ďurech | Ido Hakimi | Juan Garcia Giraldo | Mete Ismayilzada | Negar Foroutan | Skander Moalla | Tiancheng Chen | Vinko Sabolčec | Yixuan Xu | Michael Aerni | Badr AlKhamissi | Inés Altemir Marinas | Mohammad Hossein Amani | Matin Ansaripour | Ilia Badanin | Harold Benoit | Emanuela Boros | Nicholas John Browning | Fabian Bösch | Maximilian Böther | Niklas Canova | Camille Challier | Clément Charmillot | Jonathan Coles | Jan Milan Deriu | Arnout Devos | Lukas Drescher | Daniil Dzenhaliou | Maud Ehrmann | Dongyang Fan | Simin Fan | Silin Gao | Miguel Gila | María Grandury | Diba Hashemi | Alexander Miserlis Hoyle | Jiaming Jiang | Mark Klein | Andrei Kucharavy | Anastasiia Kucherenko | Frederike Lübeck | Roman Machacek | Theofilos Ioannis Manitaras | Andreas Marfurt | Kyle Matoba | Simon Matrenok | Henrique Mendonça | Fawzi Roberto Mohamed | Syrielle Montariol | Luca Mouchel | Sven Najem-Meyer | Jingwei Ni | Gennaro Oliva | Matteo Pagliardini | Elia Palme | Andrei Panferov | Léo Paoletti | Marco Passerini | Ivan Pavlov | Auguste Poiroux | Kaustubh Ponkshe | Nathan Ranchin | Javier Rando | Mathieu Sauser | Jakhongir Saydaliev | Mukhammadali Sayfiddinov | Marian Schneider | Stefano Schuppli | Marco Scialanga | Andrei Semenov | Kumar Shridhar | Raghav Singhal | Anna Sotnikova | Alexander Sternfeld | Ayush Kumar Tarun | Paul Teiletche | Jannis Vamvas | Xiaozhe Yao | Hao Zhao | Alexander Ilic | Ana Klimovic | Andreas Krause | Caglar Gulcehre | David Rosenthal | Elliott Ash | Florian Tramèr | Joost VandeVondele | Livio Veraldi | Martin Rajman | Thomas C. Schulthess | Torsten Hoefler | Antoine Bosselut | Martin Jaggi | Imanol Schlag
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Open LLMs enable AI practitioners to control development costs by building on an existing foundation for downstream applications. While offering substantial promise, current models often fail to meet the needs of users needing open solutions aligned with responsible AI principles, including data compliance, transparency, and inclusivity. In this work, we present Apertus, a fully open suite of large language models (LLMs) designed to address responsibility shortcomings in today’s open model ecosystem, namely data responsibility and global representation. Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively respecting robots.txt exclusions and filtering for non-permissive, toxic, and personally identifiable content. To mitigate risks of data memorization, we also adopt the Goldfish objective during pretraining, strongly suppressing verbatim recall of data while retaining downstream task performance. Apertus also drastically expands multilingual coverage, training on 15T tokens from over approximately 1800 languages, with about 40% of pretraining data allocated to non-English content. Released at 8B and 70B scales, Apertus approaches state-of-the-art results among fully open models on multilingual benchmarks, rivaling or surpassing open-weight counterparts.
2025
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts
Negar Foroutan | Angelika Romanou | Matin Ansaripour | Julian Martin Eisenschlos | Karl Aberer | Rémi Lebret
Findings of the Association for Computational Linguistics: ACL 2025
Negar Foroutan | Angelika Romanou | Matin Ansaripour | Julian Martin Eisenschlos | Karl Aberer | Rémi Lebret
Findings of the Association for Computational Linguistics: ACL 2025
Documents are fundamental to preserving and disseminating information, often incorporating complex layouts, tables, and charts that pose significant challenges for automatic document understanding (DU). While vision-language large models (VLLMs) have demonstrated improvements across various tasks, their effectiveness in processing long-context vision inputs remains unclear. This paper introduces WikiMixQA, a benchmark comprising 1,000 multiple-choice questions (MCQs) designed to evaluate cross-modal reasoning over tables and charts extracted from 4,000 Wikipedia pages spanning seven distinct topics. Unlike existing benchmarks, WikiMixQA emphasizes complex reasoning by requiring models to synthesize information from multiple modalities. We evaluate 12 state-of-the-art vision-language models, revealing that while proprietary models achieve ~70% accuracy when provided with direct context, their performance deteriorates significantly when retrieval from long documents is required. Among these, GPT-4-o is the only model exceeding 50% accuracy in this setting, whereas open-source models perform considerably worse, with a maximum accuracy of 27%. These findings underscore the challenges of long-context, multi-modal reasoning and establish WikiMixQA as a crucial benchmark for advancing document understanding research.
CAVE : Detecting and Explaining Commonsense Anomalies in Visual Environments
Rishika Bhagwatkar | Syrielle Montariol | Angelika Romanou | Beatriz Borges | Irina Rish | Antoine Bosselut
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Rishika Bhagwatkar | Syrielle Montariol | Angelika Romanou | Beatriz Borges | Irina Rish | Antoine Bosselut
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Humans can naturally identify, reason about, and explain anomalies in their environment. In computer vision, this long-standing challenge remains limited to industrial defects or unrealistic, synthetically generated anomalies, failing to capture the richness and unpredictability of real-world anomalies. In this work, we introduce CAVE, the first benchmark of real-world visual anomalies. CAVE supports three open-ended tasks: anomaly description, explanation, and justification; with fine-grained annotations for visual grounding and categorizing anomalies based on their visual manifestations, their complexity, severity, and commonness. These annotations draw inspiration from cognitive science research on how humans identify and resolve anomalies, providing a comprehensive framework for evaluating Vision-Language Models (VLMs) in detecting and understanding anomalies. We show that state-of-the-art VLMs struggle with visual anomaly perception and commonsense reasoning, even with advanced prompting strategies. By offering a realistic and cognitively grounded benchmark, CAVE serves as a valuable resource for advancing research in anomaly detection and commonsense reasoning in VLMs.
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Shivalika Singh | Angelika Romanou | Clémentine Fourrier | David Ifeoluwa Adelani | Jian Gang Ngui | Daniel Vila-Suero | Peerat Limkonchotiwat | Kelly Marchisio | Wei Qi Leong | Yosephine Susanto | Raymond Ng | Shayne Longpre | Sebastian Ruder | Wei-Yin Ko | Antoine Bosselut | Alice Oh | Andre Martins | Leshem Choshen | Daphne Ippolito | Enzo Ferrante | Marzieh Fadaee | Beyza Ermis | Sara Hooker
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shivalika Singh | Angelika Romanou | Clémentine Fourrier | David Ifeoluwa Adelani | Jian Gang Ngui | Daniel Vila-Suero | Peerat Limkonchotiwat | Kelly Marchisio | Wei Qi Leong | Yosephine Susanto | Raymond Ng | Shayne Longpre | Sebastian Ruder | Wei-Yin Ko | Antoine Bosselut | Alice Oh | Andre Martins | Leshem Choshen | Daphne Ippolito | Enzo Ferrante | Marzieh Fadaee | Beyza Ermis | Sara Hooker
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reliable multilingual evaluation is difficult, and culturally appropriate evaluation is even harder to achieve.A common practice to fill this gap is to machine-translate English evaluation sets. However, translation introduces language bias and carries over cultural and regional assumptions from the original questions – often testing knowledge irrelevant to the target audience. In this work, we highlight the extent and impact of these biases and present a multilingual evaluation framework that aims to mitigate them through improved translations and annotation practices.Through a large-scale study involving professional and community translators and annotators, we show that state-of-the-art models excel primarily by learning Western-centric concepts. Notably, we find that model rankings on the full MMLU change when evaluated on a subset of questions explicitly marked as culturally sensitive.We release Global MMLU, a multilingual extension of MMLU across 42 languages, featuring improved translation quality, expanded language coverage, and designated subsets labeled as culturally sensitive and culturally agnostic to enable a more comprehensive and equitable benchmark for evaluating language models across diverse linguistic and cultural contexts.
2023
CRAB: Assessing the Strength of Causal Relationships Between Real-world Events
Angelika Romanou | Syrielle Montariol | Debjit Paul | Leo Laugier | Karl Aberer | Antoine Bosselut
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Angelika Romanou | Syrielle Montariol | Debjit Paul | Leo Laugier | Karl Aberer | Antoine Bosselut
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Understanding narratives requires reasoning about the cause-and-effect relationships between events mentioned in the text. While existing foundation models yield impressive results in many NLP tasks requiring reasoning, it is unclear whether they understand the complexity of the underlying network of causal relationships of events in narratives. In this work, we present CRAB, a new Causal Reasoning Assessment Benchmark designed to evaluate causal understanding of events in real-world narratives. CRAB contains fine-grained, contextual causality annotations for ~2.7K pairs of real-world events that describe various newsworthy event timelines (e.g., the acquisition of Twitter by Elon Musk). Using CRAB, we measure the performance of several large language models, demonstrating that most systems achieve poor performance on the task. Motivated by classical causal principles, we also analyze the causal structures of groups of events in CRAB, and find that models perform worse on causal reasoning when events are derived from complex causal structures compared to simple linear causal chains. We make our dataset and code available to the research community.
2022
Multilingual Text Summarization on Financial Documents
Negar Foroutan | Angelika Romanou | Stéphane Massonnet | Rémi Lebret | Karl Aberer
Proceedings of the 4th Financial Narrative Processing Workshop @LREC2022
Negar Foroutan | Angelika Romanou | Stéphane Massonnet | Rémi Lebret | Karl Aberer
Proceedings of the 4th Financial Narrative Processing Workshop @LREC2022
This paper proposes a multilingual Automated Text Summarization (ATS) method targeting the Financial Narrative Summarization Task (FNS-2022). We developed two systems; the first uses a pre-trained abstractive summarization model that was fine-tuned on the downstream objective, the second approaches the problem as an extractive approach in which a similarity search is performed on the trained span representations. Both models aim to identify the beginning of the continuous narrative section of the document. The language models were fine-tuned on a financial document collection of three languages (English, Spanish, and Greek) and aim to identify the beginning of the summary narrative part of the document. The proposed systems achieve high performance in the given task, with the sequence-to-sequence variant ranked 1st on ROUGE-2 F1 score on the test set for each of the three languages.
Search
Fix author
Co-authors
- Antoine Bosselut 4
- Karl Aberer 3
- Negar Foroutan 3
- Syrielle Montariol 3
- Matin Ansaripour 2
- Rémi Lebret 2
- David Ifeoluwa Adelani 1
- Michael Aerni 1
- Badr AlKhamissi 1
- Mohammad Hossein Amani 1
- Elliott Ash 1
- Ilia Badanin 1
- Harold Benoit 1
- Rishika Bhagwatkar 1
- Beatriz Borges 1
- Emanuela Boroş 1
- Nicholas John Browning 1
- Fabian Bösch 1
- Maximilian Böther 1
- Niklas Canova 1
- Camille Challier 1
- Clément Charmillot 1
- Tiancheng Chen 1
- Leshem Choshen 1
- Jonathan Coles 1
- Jan Milan Deriu 1
- Arnout Devos 1
- Lukas Drescher 1
- Daniil Dzenhaliou 1
- Maud Ehrmann 1
- Julian Martin Eisenschlos 1
- Beyza Ermis 1
- Marzieh Fadaee 1
- Dongyang Fan 1
- Simin Fan 1
- Enzo Ferrante 1
- Clémentine Fourrier 1
- Silin Gao 1
- Dhia Garbaya 1
- Miguel Gila 1
- Juan Garcia Giraldo 1
- María Grandury 1
- Çağlar Gu̇lçehre 1
- Ido Hakimi 1
- Diba Hashemi 1
- Alejandro Hernández-Cano 1
- Torsten Hoefler 1
- Sara Hooker 1
- Alexander Miserlis Hoyle 1
- Allen Hao Huang 1
- Alexander Hägele 1
- Alexander Ilic 1
- Daphne Ippolito 1
- Mete Ismayilzada 1
- Martin Jaggi 1
- Jiaming Jiang 1
- Mark Klein 1
- Ana Klimovic 1
- Wei-Yin Ko 1
- Andreas Krause 1
- Andrei Kucharavy 1
- Anastasiia Kucherenko 1
- Léo Laugier 1
- Wei Qi Leong 1
- Peerat Limkonchotiwat 1
- Shayne Longpre 1
- Frederike Lübeck 1
- Roman Machacek 1
- Theofilos Ioannis Manitaras 1
- Kelly Marchisio 1
- Andreas Marfurt 1
- Inés Altemir Marinas 1
- André F. T. Martins 1
- Stéphane Massonnet 1
- Kyle Matoba 1
- Simon Matrenok 1
- Henrique Mendonça 1
- Bettina Messmer 1
- Skander Moalla 1
- Fawzi Roberto Mohamed 1
- Luca Mouchel 1
- Sven Najem-Meyer 1
- Raymond Ng 1
- Jian Gang Ngui 1
- Jingwei Ni 1
- Alice Oh 1
- Gennaro Oliva 1
- Matteo Pagliardini 1
- Elia Palme 1
- Andrei Panferov 1
- Léo Paoletti 1
- Marco Passerini 1
- Debjit Paul 1
- Ivan Pavlov 1
- Auguste Poiroux 1
- Kaustubh Ponkshe 1
- Barna Pásztor 1
- Martin Rajman 1
- Nathan Ranchin 1
- Javier Rando 1
- Irina Rish 1
- David Rosenthal 1
- Sebastian Ruder 1
- Vinko Sabolčec 1
- Mathieu Sauser 1
- Jakhongir Saydaliev 1
- Mukhammadali Sayfiddinov 1
- Imanol Schlag 1
- Marian Schneider 1
- Thomas C. Schulthess 1
- Stefano Schuppli 1
- Marco Scialanga 1
- Andrei Semenov 1
- Kumar Shridhar 1
- Shivalika Singh 1
- Raghav Singhal 1
- Antoni-Joan Solergibert 1
- Anna Sotnikova 1
- Alexander Sternfeld 1
- Yosephine Susanto 1
- Ayush Kumar Tarun 1
- Paul Teiletche 1
- Florian Tramèr 1
- Jannis Vamvas 1
- Joost VandeVondele 1
- Livio Veraldi 1
- Daniel Vila-Suero 1
- Yixuan Xu 1
- Xiaozhe Yao 1
- Hao Zhao 1
- Eduard Frank Ďurech 1