Auguste Poiroux
2026
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
Alejandro Hernández-Cano | Alexander Hägele | Allen Hao Huang | Angelika Romanou | Antoni-Joan Solergibert | Barna Pásztor | Bettina Messmer | Dhia Garbaya | Eduard Frank Ďurech | Ido Hakimi | Juan Garcia Giraldo | Mete Ismayilzada | Negar Foroutan | Skander Moalla | Tiancheng Chen | Vinko Sabolčec | Yixuan Xu | Michael Aerni | Badr AlKhamissi | Inés Altemir Marinas | Mohammad Hossein Amani | Matin Ansaripour | Ilia Badanin | Harold Benoit | Emanuela Boros | Nicholas John Browning | Fabian Bösch | Maximilian Böther | Niklas Canova | Camille Challier | Clément Charmillot | Jonathan Coles | Jan Milan Deriu | Arnout Devos | Lukas Drescher | Daniil Dzenhaliou | Maud Ehrmann | Dongyang Fan | Simin Fan | Silin Gao | Miguel Gila | María Grandury | Diba Hashemi | Alexander Miserlis Hoyle | Jiaming Jiang | Mark Klein | Andrei Kucharavy | Anastasiia Kucherenko | Frederike Lübeck | Roman Machacek | Theofilos Ioannis Manitaras | Andreas Marfurt | Kyle Matoba | Simon Matrenok | Henrique Mendonça | Fawzi Roberto Mohamed | Syrielle Montariol | Luca Mouchel | Sven Najem-Meyer | Jingwei Ni | Gennaro Oliva | Matteo Pagliardini | Elia Palme | Andrei Panferov | Léo Paoletti | Marco Passerini | Ivan Pavlov | Auguste Poiroux | Kaustubh Ponkshe | Nathan Ranchin | Javier Rando | Mathieu Sauser | Jakhongir Saydaliev | Mukhammadali Sayfiddinov | Marian Schneider | Stefano Schuppli | Marco Scialanga | Andrei Semenov | Kumar Shridhar | Raghav Singhal | Anna Sotnikova | Alexander Sternfeld | Ayush Kumar Tarun | Paul Teiletche | Jannis Vamvas | Xiaozhe Yao | Hao Zhao | Alexander Ilic | Ana Klimovic | Andreas Krause | Caglar Gulcehre | David Rosenthal | Elliott Ash | Florian Tramèr | Joost VandeVondele | Livio Veraldi | Martin Rajman | Thomas C. Schulthess | Torsten Hoefler | Antoine Bosselut | Martin Jaggi | Imanol Schlag
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Alejandro Hernández-Cano | Alexander Hägele | Allen Hao Huang | Angelika Romanou | Antoni-Joan Solergibert | Barna Pásztor | Bettina Messmer | Dhia Garbaya | Eduard Frank Ďurech | Ido Hakimi | Juan Garcia Giraldo | Mete Ismayilzada | Negar Foroutan | Skander Moalla | Tiancheng Chen | Vinko Sabolčec | Yixuan Xu | Michael Aerni | Badr AlKhamissi | Inés Altemir Marinas | Mohammad Hossein Amani | Matin Ansaripour | Ilia Badanin | Harold Benoit | Emanuela Boros | Nicholas John Browning | Fabian Bösch | Maximilian Böther | Niklas Canova | Camille Challier | Clément Charmillot | Jonathan Coles | Jan Milan Deriu | Arnout Devos | Lukas Drescher | Daniil Dzenhaliou | Maud Ehrmann | Dongyang Fan | Simin Fan | Silin Gao | Miguel Gila | María Grandury | Diba Hashemi | Alexander Miserlis Hoyle | Jiaming Jiang | Mark Klein | Andrei Kucharavy | Anastasiia Kucherenko | Frederike Lübeck | Roman Machacek | Theofilos Ioannis Manitaras | Andreas Marfurt | Kyle Matoba | Simon Matrenok | Henrique Mendonça | Fawzi Roberto Mohamed | Syrielle Montariol | Luca Mouchel | Sven Najem-Meyer | Jingwei Ni | Gennaro Oliva | Matteo Pagliardini | Elia Palme | Andrei Panferov | Léo Paoletti | Marco Passerini | Ivan Pavlov | Auguste Poiroux | Kaustubh Ponkshe | Nathan Ranchin | Javier Rando | Mathieu Sauser | Jakhongir Saydaliev | Mukhammadali Sayfiddinov | Marian Schneider | Stefano Schuppli | Marco Scialanga | Andrei Semenov | Kumar Shridhar | Raghav Singhal | Anna Sotnikova | Alexander Sternfeld | Ayush Kumar Tarun | Paul Teiletche | Jannis Vamvas | Xiaozhe Yao | Hao Zhao | Alexander Ilic | Ana Klimovic | Andreas Krause | Caglar Gulcehre | David Rosenthal | Elliott Ash | Florian Tramèr | Joost VandeVondele | Livio Veraldi | Martin Rajman | Thomas C. Schulthess | Torsten Hoefler | Antoine Bosselut | Martin Jaggi | Imanol Schlag
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Open LLMs enable AI practitioners to control development costs by building on an existing foundation for downstream applications. While offering substantial promise, current models often fail to meet the needs of users needing open solutions aligned with responsible AI principles, including data compliance, transparency, and inclusivity. In this work, we present Apertus, a fully open suite of large language models (LLMs) designed to address responsibility shortcomings in today’s open model ecosystem, namely data responsibility and global representation. Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively respecting robots.txt exclusions and filtering for non-permissive, toxic, and personally identifiable content. To mitigate risks of data memorization, we also adopt the Goldfish objective during pretraining, strongly suppressing verbatim recall of data while retaining downstream task performance. Apertus also drastically expands multilingual coverage, training on 15T tokens from over approximately 1800 languages, with about 40% of pretraining data allocated to non-English content. Released at 8B and 70B scales, Apertus approaches state-of-the-art results among fully open models on multilingual benchmarks, rivaling or surpassing open-weight counterparts.
2025
RLMEval: Evaluating Research-Level Neural Theorem Proving
Auguste Poiroux | Antoine Bosselut | Viktor Kunčak
Findings of the Association for Computational Linguistics: EMNLP 2025
Auguste Poiroux | Antoine Bosselut | Viktor Kunčak
Findings of the Association for Computational Linguistics: EMNLP 2025
Despite impressive results on curated benchmarks, the practical impact of large language models (LLMs) on research-level neural theorem proving and proof autoformalization is still limited. We introduce RLMEval, an evaluation suite for these tasks, focusing on research-level mathematics from real-world Lean formalization projects. RLMEval targets the evaluation of neural theorem proving and proof autoformalization on challenging research-level theorems by leveraging real Lean Blueprint formalization projects. Our evaluation of state-of-the-art models on RLMEval, comprising 613 theorems from 6 Lean projects, reveals a significant gap: progress on existing benchmarks does not readily translate to these more realistic settings, with the best model achieving only a 10.3% pass rate. RLMEval provides a new, challenging benchmark designed to guide and accelerate progress in automated reasoning for formal mathematics.
Reliable Evaluation and Benchmarks for Statement Autoformalization
Auguste Poiroux | Gail Weiss | Viktor Kunčak | Antoine Bosselut
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Auguste Poiroux | Gail Weiss | Viktor Kunčak | Antoine Bosselut
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Evaluating statement autoformalization, translating natural language mathematics into formal languages like Lean 4, remains a significant challenge, with few metrics, datasets, and standards to robustly measure progress. In this work, we present a comprehensive approach combining improved metrics, robust benchmarks, and systematic evaluation, to fill this gap. First, we introduce BEq+, an automated metric that correlates strongly with human judgment, along with ProofNetVerif, a new dataset for assessing the quality of evaluation metrics, containing 3,752 annotated examples. Second, we develop two new autoformalization benchmarks: ProofNet#, a corrected version of ProofNet, and RLM25, with 619 new pairs of research-level mathematics from six formalization projects. Through systematic experimentation across these benchmarks, we find that current techniques can achieve up to 45.1% accuracy on undergraduate mathematics but struggle with research-level content without proper context. Our work establishes a reliable foundation for evaluating and advancing autoformalization systems.
Search
Fix author
Co-authors
- Antoine Bosselut 3
- Viktor Kunčak 2
- Michael Aerni 1
- Badr AlKhamissi 1
- Mohammad Hossein Amani 1
- Matin Ansaripour 1
- Elliott Ash 1
- Ilia Badanin 1
- Harold Benoit 1
- Emanuela Boroş 1
- Nicholas John Browning 1
- Fabian Bösch 1
- Maximilian Böther 1
- Niklas Canova 1
- Camille Challier 1
- Clément Charmillot 1
- Tiancheng Chen 1
- Jonathan Coles 1
- Jan Milan Deriu 1
- Arnout Devos 1
- Lukas Drescher 1
- Daniil Dzenhaliou 1
- Maud Ehrmann 1
- Dongyang Fan 1
- Simin Fan 1
- Negar Foroutan 1
- Silin Gao 1
- Dhia Garbaya 1
- Miguel Gila 1
- Juan Garcia Giraldo 1
- María Grandury 1
- Çağlar Gu̇lçehre 1
- Ido Hakimi 1
- Diba Hashemi 1
- Alejandro Hernández-Cano 1
- Torsten Hoefler 1
- Alexander Miserlis Hoyle 1
- Allen Hao Huang 1
- Alexander Hägele 1
- Alexander Ilic 1
- Mete Ismayilzada 1
- Martin Jaggi 1
- Jiaming Jiang 1
- Mark Klein 1
- Ana Klimovic 1
- Andreas Krause 1
- Andrei Kucharavy 1
- Anastasiia Kucherenko 1
- Frederike Lübeck 1
- Roman Machacek 1
- Theofilos Ioannis Manitaras 1
- Andreas Marfurt 1
- Inés Altemir Marinas 1
- Kyle Matoba 1
- Simon Matrenok 1
- Henrique Mendonça 1
- Bettina Messmer 1
- Skander Moalla 1
- Fawzi Roberto Mohamed 1
- Syrielle Montariol 1
- Luca Mouchel 1
- Sven Najem-Meyer 1
- Jingwei Ni 1
- Gennaro Oliva 1
- Matteo Pagliardini 1
- Elia Palme 1
- Andrei Panferov 1
- Léo Paoletti 1
- Marco Passerini 1
- Ivan Pavlov 1
- Kaustubh Ponkshe 1
- Barna Pásztor 1
- Martin Rajman 1
- Nathan Ranchin 1
- Javier Rando 1
- Angelika Romanou 1
- David Rosenthal 1
- Vinko Sabolčec 1
- Mathieu Sauser 1
- Jakhongir Saydaliev 1
- Mukhammadali Sayfiddinov 1
- Imanol Schlag 1
- Marian Schneider 1
- Thomas C. Schulthess 1
- Stefano Schuppli 1
- Marco Scialanga 1
- Andrei Semenov 1
- Kumar Shridhar 1
- Raghav Singhal 1
- Antoni-Joan Solergibert 1
- Anna Sotnikova 1
- Alexander Sternfeld 1
- Ayush Kumar Tarun 1
- Paul Teiletche 1
- Florian Tramèr 1
- Jannis Vamvas 1
- Joost VandeVondele 1
- Livio Veraldi 1
- Gail Weiss 1
- Yixuan Xu 1
- Xiaozhe Yao 1
- Hao Zhao 1
- Eduard Frank Ďurech 1