NSL-MT: Linguistically Informed Negative Samples for Efficient Machine Translation in African Low-Resource Languages

Mamadou K. Keita, Christopher M Homan, Huy Le


Abstract
We introduce negative space learning machine translation (NSL-MT), a training method for underresourced languages, that augments limited parallel data with synthetically generated violations of the target language’s grammar and explicitly penalizes the model when it assigns high probability to these linguistically invalid outputs. NSL-MT delivers improvements across all baselines we tested, including 3-12% BLEU gains for well-performing models and 56-89% gains for models lacking decent initial support. Furthermore, NSL-MT provides a 5x data efficiency multiplier: training with 1,000 examples matches or exceeds normal training with 5,000 examples. NSL-MT thus provides a data-efficient alternative training method for settings where parallel data is limited.
Anthology ID:
2026.findings-acl.465
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9545–9560
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.465/
DOI:
Bibkey:
Cite (ACL):
Mamadou K. Keita, Christopher M Homan, and Huy Le. 2026. NSL-MT: Linguistically Informed Negative Samples for Efficient Machine Translation in African Low-Resource Languages. In Findings of the Association for Computational Linguistics: ACL 2026, pages 9545–9560, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
NSL-MT: Linguistically Informed Negative Samples for Efficient Machine Translation in African Low-Resource Languages (Keita et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.465.pdf
Checklist:
 2026.findings-acl.465.checklist.pdf