System Description of the NordicsAlps Submission to the AmericasNLP 2024 Machine Translation Shared Task
Joseph Attieh, Zachary Hopton, Yves Scherrer, Tanja Samardžić
Abstract
This paper presents the system description of the NordicsAlps team for the AmericasNLP 2024 Machine Translation Shared Task 1. We investigate the effect of tokenization on translation quality by exploring two different tokenization schemes: byte-level and redundancy-driven tokenization. We submitted three runs per language pair. The redundancy-driven tokenization ranked first among all submissions, scoring the highest average chrF2++, chrF, and BLEU metrics (averaged across all languages). These findings demonstrate the importance of carefully tailoring the tokenization strategies of machine translation systems, particularly in resource-constrained scenarios.- Anthology ID:
- 2024.americasnlp-1.18
- Volume:
- Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024)
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Manuel Mager, Abteen Ebrahimi, Shruti Rijhwani, Arturo Oncevay, Luis Chiruzzo, Robert Pugh, Katharina von der Wense
- Venues:
- AmericasNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 150–158
- Language:
- URL:
- https://aclanthology.org/2024.americasnlp-1.18
- DOI:
- Cite (ACL):
- Joseph Attieh, Zachary Hopton, Yves Scherrer, and Tanja Samardžić. 2024. System Description of the NordicsAlps Submission to the AmericasNLP 2024 Machine Translation Shared Task. In Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024), pages 150–158, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- System Description of the NordicsAlps Submission to the AmericasNLP 2024 Machine Translation Shared Task (Attieh et al., AmericasNLP-WS 2024)
- PDF:
- https://preview.aclanthology.org/fix-volume-bibkeys/2024.americasnlp-1.18.pdf