Exploring Multitask Learning for Low-Resource Abstractive Summarization

Ahmed Magooda, Diane Litman, Mohamed Elaraby


Abstract
This paper explores the effect of using multitask learning for abstractive summarization in the context of small training corpora. In particular, we incorporate four different tasks (extractive summarization, language modeling, concept detection, and paraphrase detection) both individually and in combination, with the goal of enhancing the target task of abstractive summarization via multitask learning. We show that for many task combinations, a model trained in a multitask setting outperforms a model trained only for abstractive summarization, with no additional summarization data introduced. Additionally, we do a comprehensive search and find that certain tasks (e.g. paraphrase detection) consistently benefit abstractive summarization, not only when combined with other tasks but also when using different architectures and training corpora.
Anthology ID:
2021.findings-emnlp.142
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1652–1661
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.142
DOI:
10.18653/v1/2021.findings-emnlp.142
Bibkey:
Cite (ACL):
Ahmed Magooda, Diane Litman, and Mohamed Elaraby. 2021. Exploring Multitask Learning for Low-Resource Abstractive Summarization. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1652–1661, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Exploring Multitask Learning for Low-Resource Abstractive Summarization (Magooda et al., Findings 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/2021.findings-emnlp.142.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-3/2021.findings-emnlp.142.mp4
Code
 amagooda/multiabs
Data
CNN/Daily Mail