Mitigating Tokenization-Induced Distance Distortion in Long-Context Multilingual Machine Translation

Khotso Selialia, Antoine Nzeyimana, Fatima M. Anwar


Abstract
Multilingual neural machine translation (MNMT) models degrade in performance as input context length increases, causing positional encoding schemes to misinterpret token distances. Existing absolute and relative positional encodings rely on fixed token indices and implicitly assume uniform semantic density, which breaks down for long-context inputs. We introduce DCARPE, a tokenization-aware adaptive positional encoding that conditions relative positional bias on input-level sequence length and fragmentation statistics, allowing the model to reinterpret positional distance when tokenization-induced inflation arises rather than semantic factors. Evaluations on JW300 and out-of-distribution FLORES-200 demonstrate consistent improvements in long-context robustness, achieving gains of up to +10.81 ChrF++ and +8.00 BLEU over baselines.
Anthology ID:
2026.acl-long.1696
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
36591–36602
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1696/
DOI:
Bibkey:
Cite (ACL):
Khotso Selialia, Antoine Nzeyimana, and Fatima M. Anwar. 2026. Mitigating Tokenization-Induced Distance Distortion in Long-Context Multilingual Machine Translation. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 36591–36602, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Mitigating Tokenization-Induced Distance Distortion in Long-Context Multilingual Machine Translation (Selialia et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1696.pdf
Checklist:
 2026.acl-long.1696.checklist.pdf