When Flores Bloomz Wrong: Cross-Direction Contamination in Machine Translation Evaluation
David Tan, Pinzhen Chen, Josef Van Genabith, Koel Dutta Chowdhury
Abstract
Large language models (LLMs) can be benchmark-contaminated, resulting in inflated scores that mask memorization as generalization, and in multilingual settings, this memorization can even transfer to "uncontaminated" languages. Using the FLORES-200 translation benchmark as a diagnostic, we study two 7-8B instruction-tuned multilingual LLMs: Bloomz, which was trained on FLORES, and Llama as an uncontaminated control. We confirm Bloomz’s FLORES contamination and demonstrate that machine translation contamination can be cross-directional, artificially boosting performance in unseen translation directions due to target-side memorization. Further analysis shows that recall of memorized references often persists despite various source-side perturbation efforts like paraphrasing and named entity replacement. However, replacing named entities leads to a consistent decrease in BLEU, suggesting an effective probing method for memorization in contaminated models.- Anthology ID:
- 2026.eacl-short.26
- Volume:
- Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Editors:
- Vera Demberg, Kentaro Inui, Lluís Marquez
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 345–358
- Language:
- URL:
- https://preview.aclanthology.org/ingest-eacl/2026.eacl-short.26/
- DOI:
- Cite (ACL):
- David Tan, Pinzhen Chen, Josef Van Genabith, and Koel Dutta Chowdhury. 2026. When Flores Bloomz Wrong: Cross-Direction Contamination in Machine Translation Evaluation. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), pages 345–358, Rabat, Morocco. Association for Computational Linguistics.
- Cite (Informal):
- When Flores Bloomz Wrong: Cross-Direction Contamination in Machine Translation Evaluation (Tan et al., EACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-eacl/2026.eacl-short.26.pdf