When Flores Bloomz Wrong: Cross-Direction Contamination in Machine Translation Evaluation

David Tan; Pinzhen Chen; Josef van Genabith; Koel Dutta Chowdhury

When Flores Bloomz Wrong: Cross-Direction Contamination in Machine Translation Evaluation

David Tan, Pinzhen Chen, Josef Van Genabith, Koel Dutta Chowdhury

Abstract

Large language models (LLMs) can be benchmark-contaminated, resulting in inflated scores that mask memorization as generalization, and in multilingual settings, this memorization can even transfer to "uncontaminated" languages. Using the FLORES-200 translation benchmark as a diagnostic, we study two 7-8B instruction-tuned multilingual LLMs: Bloomz, which was trained on FLORES, and Llama as an uncontaminated control. We confirm Bloomz’s FLORES contamination and demonstrate that machine translation contamination can be cross-directional, artificially boosting performance in unseen translation directions due to target-side memorization. Further analysis shows that recall of memorized references often persists despite various source-side perturbation efforts like paraphrasing and named entity replacement. However, replacing named entities leads to a consistent decrease in BLEU, suggesting an effective probing method for memorization in contaminated models.

Anthology ID:: 2026.eacl-short.26
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 345–358
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.eacl-short.26/
DOI:
Bibkey:
Cite (ACL):: David Tan, Pinzhen Chen, Josef Van Genabith, and Koel Dutta Chowdhury. 2026. When Flores Bloomz Wrong: Cross-Direction Contamination in Machine Translation Evaluation. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), pages 345–358, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: When Flores Bloomz Wrong: Cross-Direction Contamination in Machine Translation Evaluation (Tan et al., EACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.eacl-short.26.pdf
Checklist:: 2026.eacl-short.26.checklist.pdf

PDF Cite Search Checklist Fix data