AraBART: a Pretrained Arabic Sequence-to-Sequence Model for Abstractive Summarization
Moussa Kamal Eddine, Nadi Tomeh, Nizar Habash, Joseph Le Roux, Michalis Vazirgiannis
Abstract
Like most natural language understanding and generation tasks, state-of-the-art models for summarization are transformer-based sequence-to-sequence architectures that are pretrained on large corpora. While most existing models focus on English, Arabic remains understudied. In this paper we propose AraBART, the first Arabic model in which the encoder and the decoder are pretrained end-to-end, based on BART. We show that AraBART achieves the best performance on multiple abstractive summarization datasets, outperforming strong baselines including a pretrained Arabic BERT-based model, multilingual BART, Arabic T5, and a multilingual T5 model. AraBART is publicly available.- Anthology ID:
- 2022.wanlp-1.4
- Volume:
- Proceedings of the The Seventh Arabic Natural Language Processing Workshop (WANLP)
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates (Hybrid)
- Venue:
- WANLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 31–42
- Language:
- URL:
- https://aclanthology.org/2022.wanlp-1.4
- DOI:
- 10.18653/v1/2022.wanlp-1.4
- Cite (ACL):
- Moussa Kamal Eddine, Nadi Tomeh, Nizar Habash, Joseph Le Roux, and Michalis Vazirgiannis. 2022. AraBART: a Pretrained Arabic Sequence-to-Sequence Model for Abstractive Summarization. In Proceedings of the The Seventh Arabic Natural Language Processing Workshop (WANLP), pages 31–42, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Cite (Informal):
- AraBART: a Pretrained Arabic Sequence-to-Sequence Model for Abstractive Summarization (Kamal Eddine et al., WANLP 2022)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2022.wanlp-1.4.pdf