Mixed-effects transformers for hierarchical adaptation

Julia White, Noah Goodman, Robert Hawkins


Abstract
Language differs dramatically from context to context. To some degree, large language models like GPT-3 account for such variation by conditioning on strings of initial input text, or prompts. However, prompting can be ineffective when contexts are sparse, out-of-sample, or extra-textual. In this paper, we introduce the mixed-effects transformer (MET), a novel approach for learning hierarchically-structured prefixes— lightweight modules prepended to an input sequence— to account for structured variation in language use. Specifically, we show how the popular class of mixed-effects regression models may be extended to transformer-based architectures using a regularized prefix-tuning procedure with dropout. We evaluate this approach on several domain-adaptation benchmarks, finding that it learns contextual variation from minimal data while generalizing well to unseen contexts.
Anthology ID:
2022.emnlp-main.261
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3944–3954
Language:
URL:
https://aclanthology.org/2022.emnlp-main.261
DOI:
Bibkey:
Cite (ACL):
Julia White, Noah Goodman, and Robert Hawkins. 2022. Mixed-effects transformers for hierarchical adaptation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3944–3954, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Mixed-effects transformers for hierarchical adaptation (White et al., EMNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.emnlp-main.261.pdf