Multidomain Pretrained Language Models for Green NLP

Antonis Maronikolakis, Hinrich Schütze


Abstract
When tackling a task in a given domain, it has been shown that adapting a model to the domain using raw text data before training on the supervised task improves performance versus solely training on the task. The downside is that a lot of domain data is required and if we want to tackle tasks in n domains, we require n models each adapted on domain data before task learning. Storing and using these models separately can be prohibitive for low-end devices. In this paper we show that domain adaptation can be generalised to cover multiple domains. Specifically, a single model can be trained across various domains at the same time with minimal drop in performance, even when we use less data and resources. Thus, instead of training multiple models, we can train a single multidomain model saving on computational resources and training time.
Anthology ID:
2021.adaptnlp-1.1
Volume:
Proceedings of the Second Workshop on Domain Adaptation for NLP
Month:
April
Year:
2021
Address:
Kyiv, Ukraine
Venue:
AdaptNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–8
Language:
URL:
https://aclanthology.org/2021.adaptnlp-1.1
DOI:
Bibkey:
Cite (ACL):
Antonis Maronikolakis and Hinrich Schütze. 2021. Multidomain Pretrained Language Models for Green NLP. In Proceedings of the Second Workshop on Domain Adaptation for NLP, pages 1–8, Kyiv, Ukraine. Association for Computational Linguistics.
Cite (Informal):
Multidomain Pretrained Language Models for Green NLP (Maronikolakis & Schütze, AdaptNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.adaptnlp-1.1.pdf
Code
 antmarakis/multidomain_green_nlp
Data
AG NewsIMDb Movie ReviewsMultiNLIPubMed RCTRealNewsSARCSciCiteTalkDown