Abstract
Addressing the correct gender in generative tasks (e.g., Machine Translation) has been an overlooked issue in the Arabic NLP. However, the recent introduction of the Arabic Parallel Gender Corpus (APGC) dataset has established new baselines for the Arabic Gender Rewriting task. To address the Gender Rewriting task, we first pre-train our new Seq2Seq ArabicT5 model on a 17GB of Arabic Corpora. Then, we continue pre-training our ArabicT5 model on the APGC dataset using a newly proposed method. Our evaluation shows that our ArabicT5 model, when trained on the APGC dataset, achieved competitive results against existing state-of-the-art methods. In addition, our ArabicT5 model shows better results on the APGC dataset compared to other Arabic and multilingual T5 models.- Anthology ID:
- 2022.wanlp-1.55
- Volume:
- Proceedings of the The Seventh Arabic Natural Language Processing Workshop (WANLP)
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates (Hybrid)
- Venue:
- WANLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 491–495
- Language:
- URL:
- https://aclanthology.org/2022.wanlp-1.55
- DOI:
- Cite (ACL):
- Sultan Alrowili and Vijay Shanker. 2022. Generative Approach for Gender-Rewriting Task with ArabicT5. In Proceedings of the The Seventh Arabic Natural Language Processing Workshop (WANLP), pages 491–495, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Cite (Informal):
- Generative Approach for Gender-Rewriting Task with ArabicT5 (Alrowili & Shanker, WANLP 2022)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2022.wanlp-1.55.pdf