Mana Baladi


2025

pdf bib
PARME: Parallel Corpora for Low-Resourced Middle Eastern Languages
Sina Ahmadi | Rico Sennrich | Erfan Karami | Ako Marani | Parviz Fekrazad | Gholamreza Akbarzadeh Baghban | Hanah Hadi | Semko Heidari | Mahîr Dogan | Pedram Asadi | Dashne Bashir | Mohammad Amin Ghodrati | Kourosh Amini | Zeynab Ashourinezhad | Mana Baladi | Farshid Ezzati | Alireza Ghasemifar | Daryoush Hosseinpour | Behrooz Abbaszadeh | Amin Hassanpour | Bahaddin Jalal Hamaamin | Saya Kamal Hama | Ardeshir Mousavi | Sarko Nazir Hussein | Isar Nejadgholi | Mehmet Ölmez | Horam Osmanpour | Rashid Roshan Ramezani | Aryan Sediq Aziz | Ali Salehi | Mohammadreza Yadegari | Kewyar Yadegari | Sedighe Zamani Roodsari
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The Middle East is characterized by remarkable linguistic diversity, with over 400 million inhabitants speaking more than 60 languages across multiple language families. This study presents a pioneering work in developing the first parallel corpora for eight severely under-resourced varieties in the region–PARME, addressing fundamental challenges in low-resource scenarios including non-standardized writing and dialectal complexity. Through an extensive community-driven initiative, volunteers contributed to the creation of over 36,000 translated sentences, marking a significant milestone in resource development. We evaluate machine translation capabilities through zero-shot approaches and fine-tuning experiments with pretrained machine translation models and provide a comprehensive analysis of limitations. Our findings reveal significant gaps in existing technologies for processing the selected languages, highlighting critical areas for improvement in language technology for Middle Eastern languages.