Farshid Ezzati
2025
PARME: Parallel Corpora for Low-Resourced Middle Eastern Languages
Sina Ahmadi | Rico Sennrich | Erfan Karami | Ako Marani | Parviz Fekrazad | Gholamreza Akbarzadeh Baghban | Hanah Hadi | Semko Heidari | Mahîr Dogan | Pedram Asadi | Dashne Bashir | Mohammad Amin Ghodrati | Kourosh Amini | Zeynab Ashourinezhad | Mana Baladi | Farshid Ezzati | Alireza Ghasemifar | Daryoush Hosseinpour | Behrooz Abbaszadeh | Amin Hassanpour | Bahaddin Jalal Hamaamin | Saya Kamal Hama | Ardeshir Mousavi | Sarko Nazir Hussein | Isar Nejadgholi | Mehmet Ölmez | Horam Osmanpour | Rashid Roshan Ramezani | Aryan Sediq Aziz | Ali Salehi Sheikhalikelayeh | Mohammadreza Yadegari | Kewyar Yadegari | Sedighe Zamani Roodsari
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Sina Ahmadi | Rico Sennrich | Erfan Karami | Ako Marani | Parviz Fekrazad | Gholamreza Akbarzadeh Baghban | Hanah Hadi | Semko Heidari | Mahîr Dogan | Pedram Asadi | Dashne Bashir | Mohammad Amin Ghodrati | Kourosh Amini | Zeynab Ashourinezhad | Mana Baladi | Farshid Ezzati | Alireza Ghasemifar | Daryoush Hosseinpour | Behrooz Abbaszadeh | Amin Hassanpour | Bahaddin Jalal Hamaamin | Saya Kamal Hama | Ardeshir Mousavi | Sarko Nazir Hussein | Isar Nejadgholi | Mehmet Ölmez | Horam Osmanpour | Rashid Roshan Ramezani | Aryan Sediq Aziz | Ali Salehi Sheikhalikelayeh | Mohammadreza Yadegari | Kewyar Yadegari | Sedighe Zamani Roodsari
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The Middle East is characterized by remarkable linguistic diversity, with over 400 million inhabitants speaking more than 60 languages across multiple language families. This study presents a pioneering work in developing the first parallel corpora for eight severely under-resourced varieties in the region–PARME, addressing fundamental challenges in low-resource scenarios including non-standardized writing and dialectal complexity. Through an extensive community-driven initiative, volunteers contributed to the creation of over 36,000 translated sentences, marking a significant milestone in resource development. We evaluate machine translation capabilities through zero-shot approaches and fine-tuning experiments with pretrained machine translation models and provide a comprehensive analysis of limitations. Our findings reveal significant gaps in existing technologies for processing the selected languages, highlighting critical areas for improvement in language technology for Middle Eastern languages.
Search
Fix author
Co-authors
- Behrooz Abbaszadeh 1
- Sina Ahmadi 1
- Kourosh Amini 1
- Pedram Asadi 1
- Zeynab Ashourinezhad 1
- Aryan Sediq Aziz 1
- Gholamreza Akbarzadeh Baghban 1
- Mana Baladi 1
- Dashne Bashir 1
- Mahîr Dogan 1
- Parviz Fekrazad 1
- Alireza Ghasemifar 1
- Mohammad Amin Ghodrati 1
- Hanah Hadi 1
- Saya Kamal Hama 1
- Bahaddin Jalal Hamaamin 1
- Amin Hassanpour 1
- Semko Heidari 1
- Daryoush Hosseinpour 1
- Sarko Nazir Hussein 1
- Erfan Karami 1
- Ako Marani 1
- Ardeshir Mousavi 1
- Isar Nejadgholi 1
- Horam Osmanpour 1
- Rashid Roshan Ramezani 1
- Sedighe Zamani Roodsari 1
- Ali Salehi Sheikhalikelayeh 1
- Rico Sennrich 1
- Mohammadreza Yadegari 1
- Kewyar Yadegari 1
- Mehmet Ölmez 1
Venues
- acl1