Gaperon: A Peppered English-French Generative Language Model Suite

Nathan Godey, Wissam Antoun, Rian Touchent, Rachel Bawden, \'Eric Villemonte de la Clergerie, Beno{\^\i}t Sagot, Djam\'e Seddah


Abstract
Standardized benchmarks have become the dominant metric for measuring progress in large language models, yet their validity is increasingly compromised by data contamination and the unclear relationship between benchmark scores and genuine language understanding. We introduce Gaperon, a suite of fully open bilingual (French-English) language models designed as an experimental testbed to investigate evaluation dynamics under realistic training conditions. Our study makes three core contributions. First, we demonstrate mismatches between benchmark performance and generation quality: models that excel on benchmarks may underperform in qualitative text generation, and vice versa. Second, through our deliberately contaminated Gaperon-Garlic variant, we show that competitive benchmark scores can be recovered via late-stage contamination with only moderate degradation of generation quality, and surprisingly, such contamination also improves performance on held-out benchmarks. Third, we provide empirical evidence that widely used neural quality filters, particularly those trained to favor instructional or educational content, amplify benchmark contamination in pretraining corpora, with the DCLM classifier systematically ranking benchmark samples in the top-5 percentiles of samples. We release all models, data mixtures, checkpoints, and evaluation code to support reproducibility and further investigation.
Anthology ID:
2026.findings-acl.1955
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
39216–39257
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1955/
DOI:
Bibkey:
Cite (ACL):
Nathan Godey, Wissam Antoun, Rian Touchent, Rachel Bawden, \'Eric Villemonte de la Clergerie, Beno{\^\i}t Sagot, and Djam\'e Seddah. 2026. Gaperon: A Peppered English-French Generative Language Model Suite. In Findings of the Association for Computational Linguistics: ACL 2026, pages 39216–39257, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Gaperon: A Peppered English-French Generative Language Model Suite (Godey et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1955.pdf
Checklist:
 2026.findings-acl.1955.checklist.pdf