TADA: Efficient Task-Agnostic Domain Adaptation for Transformers

Chia-Chien Hung, Lukas Lange, Jannik Strötgen


Abstract
Intermediate training of pre-trained transformer-based language models on domain-specific data leads to substantial gains for downstream tasks. To increase efficiency and prevent catastrophic forgetting alleviated from full domain-adaptive pre-training, approaches such as adapters have been developed. However, these require additional parameters for each layer, and are criticized for their limited expressiveness. In this work, we introduce TADA, a novel task-agnostic domain adaptation method which is modular, parameter-efficient, and thus, data-efficient. Within TADA, we retrain the embeddings to learn domain-aware input representations and tokenizers for the transformer encoder, while freezing all other parameters of the model. Then, task-specific fine-tuning is performed. We further conduct experiments with meta-embeddings and newly introduced meta-tokenizers, resulting in one model per task in multi-domain use cases. Our broad evaluation in 4 downstream tasks for 14 domains across single- and multi-domain setups and high- and low-resource scenarios reveals that TADA is an effective and efficient alternative to full domain-adaptive pre-training and adapters for domain adaptation, while not introducing additional parameters or complex training steps.
Anthology ID:
2023.findings-acl.31
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
487–503
Language:
URL:
https://aclanthology.org/2023.findings-acl.31
DOI:
10.18653/v1/2023.findings-acl.31
Bibkey:
Cite (ACL):
Chia-Chien Hung, Lukas Lange, and Jannik Strötgen. 2023. TADA: Efficient Task-Agnostic Domain Adaptation for Transformers. In Findings of the Association for Computational Linguistics: ACL 2023, pages 487–503, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
TADA: Efficient Task-Agnostic Domain Adaptation for Transformers (Hung et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2023.findings-acl.31.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-5/2023.findings-acl.31.mp4