The Helsinki-NLP Submissions at NADI 2023 Shared Task: Walking the Baseline

Yves Scherrer; Aleksandra Miletić; Olli Kuparinen

doi:10.18653/v1/2023.arabicnlp-1.73

The Helsinki-NLP Submissions at NADI 2023 Shared Task: Walking the Baseline

Yves Scherrer, Aleksandra Miletić, Olli Kuparinen

Abstract

The Helsinki-NLP team participated in the NADI 2023 shared tasks on Arabic dialect translation with seven submissions. We used statistical (SMT) and neural machine translation (NMT) methods and explored character- and subword-based data preprocessing. Our submissions placed second in both tracks. In the open track, our winning submission is a character-level SMT system with additional Modern Standard Arabic language models. In the closed track, our best BLEU scores were obtained with the leave-as-is baseline, a simple copy of the input, and narrowly followed by SMT systems. In both tracks, fine-tuning existing multilingual models such as AraT5 or ByT5 did not yield superior performance compared to SMT.

Anthology ID:: 2023.arabicnlp-1.73
Volume:: Proceedings of ArabicNLP 2023
Month:: December
Year:: 2023
Address:: Singapore (Hybrid)
Editors:: Hassan Sawaf, Samhaa El-Beltagy, Wajdi Zaghouani, Walid Magdy, Ahmed Abdelali, Nadi Tomeh, Ibrahim Abu Farha, Nizar Habash, Salam Khalifa, Amr Keleg, Hatem Haddad, Imed Zitouni, Khalil Mrini, Rawan Almatham
Venues:: ArabicNLP | WS
SIG:: SIGARAB
Publisher:: Association for Computational Linguistics
Note:
Pages:: 670–677
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2023.arabicnlp-1.73/
DOI:: 10.18653/v1/2023.arabicnlp-1.73
Bibkey:
Cite (ACL):: Yves Scherrer, Aleksandra Miletić, and Olli Kuparinen. 2023. The Helsinki-NLP Submissions at NADI 2023 Shared Task: Walking the Baseline. In Proceedings of ArabicNLP 2023, pages 670–677, Singapore (Hybrid). Association for Computational Linguistics.
Cite (Informal):: The Helsinki-NLP Submissions at NADI 2023 Shared Task: Walking the Baseline (Scherrer et al., ArabicNLP 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2023.arabicnlp-1.73.pdf

PDF Cite Search Fix data