Abstract
Zero-shot multi-speaker text-to-speech (ZS-TTS) systems have advanced for English, however, it still lags behind due to insufficient resources. We address this gap for Arabic, a language of more than 450 million native speakers, by first adapting a sizeable existing dataset to suit the needs of speech synthesis. Additionally, we employ a set of Arabic dialect identification models to explore the impact of pre-defined dialect labels on improving the ZS-TTS model in a multi-dialect setting. Subsequently, we fine-tune the XTTS model, an open-source architecture. We then evaluate our models on a dataset comprising 31 unseen speakers and an in-house dialectal dataset. Our automated and human evaluation results show convincing performance while capable of generating dialectal speech. Our study highlights significant potential for improvements in this emerging area of research in Arabic.- Anthology ID:
- 2024.arabicnlp-1.11
- Volume:
- Proceedings of The Second Arabic Natural Language Processing Conference
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Nizar Habash, Houda Bouamor, Ramy Eskander, Nadi Tomeh, Ibrahim Abu Farha, Ahmed Abdelali, Samia Touileb, Injy Hamed, Yaser Onaizan, Bashar Alhafni, Wissam Antoun, Salam Khalifa, Hatem Haddad, Imed Zitouni, Badr AlKhamissi, Rawan Almatham, Khalil Mrini
- Venues:
- ArabicNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 123–129
- Language:
- URL:
- https://aclanthology.org/2024.arabicnlp-1.11
- DOI:
- 10.18653/v1/2024.arabicnlp-1.11
- Cite (ACL):
- Khai Doan, Abdul Waheed, and Muhammad Abdul-Mageed. 2024. Towards Zero-Shot Text-To-Speech for Arabic Dialects. In Proceedings of The Second Arabic Natural Language Processing Conference, pages 123–129, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- Towards Zero-Shot Text-To-Speech for Arabic Dialects (Doan et al., ArabicNLP-WS 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2024.arabicnlp-1.11.pdf