Gaperon: A Peppered English-French Generative Language Model Suite
Nathan Godey, Wissam Antoun, Rian Touchent, Rachel Bawden, \'Eric Villemonte de la Clergerie, Beno{\^\i}t Sagot, Djam\'e Seddah
Abstract
Standardized benchmarks have become the dominant metric for measuring progress in large language models, yet their validity is increasingly compromised by data contamination and the unclear relationship between benchmark scores and genuine language understanding. We introduce Gaperon, a suite of fully open bilingual (French-English) language models designed as an experimental testbed to investigate evaluation dynamics under realistic training conditions. Our study makes three core contributions. First, we demonstrate mismatches between benchmark performance and generation quality: models that excel on benchmarks may underperform in qualitative text generation, and vice versa. Second, through our deliberately contaminated Gaperon-Garlic variant, we show that competitive benchmark scores can be recovered via late-stage contamination with only moderate degradation of generation quality, and surprisingly, such contamination also improves performance on held-out benchmarks. Third, we provide empirical evidence that widely used neural quality filters, particularly those trained to favor instructional or educational content, amplify benchmark contamination in pretraining corpora, with the DCLM classifier systematically ranking benchmark samples in the top-5 percentiles of samples. We release all models, data mixtures, checkpoints, and evaluation code to support reproducibility and further investigation.- Anthology ID:
- 2026.findings-acl.1955
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 39216–39257
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1955/
- DOI:
- Cite (ACL):
- Nathan Godey, Wissam Antoun, Rian Touchent, Rachel Bawden, \'Eric Villemonte de la Clergerie, Beno{\^\i}t Sagot, and Djam\'e Seddah. 2026. Gaperon: A Peppered English-French Generative Language Model Suite. In Findings of the Association for Computational Linguistics: ACL 2026, pages 39216–39257, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Gaperon: A Peppered English-French Generative Language Model Suite (Godey et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1955.pdf