Abstract
Some variants of self-supervised denoising objectives for pre-training encoder-decoder language models have been reported to have a negligible impact on downstream performance. Yet the design of these pre-training objectives leads to behavioural differences that can be uncovered with specific manipulations. We reproduce a recently proposed zero-shot control method and find that it is only successful on a subset of models. To understand what causes the difference in its effectiveness, we perform a set of controlled experiments, varying only the pre-training objective, and find unexpected interactions between the pre-training method and downstream controllability of models after fine-tuning. Our results show that different pre-training objectives have consequences that may not be visible in standard downstream evaluation, but which should be taken into account when developing models with controllability in mind.- Anthology ID:
- 2023.findings-acl.438
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2023
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 7010–7022
- Language:
- URL:
- https://aclanthology.org/2023.findings-acl.438
- DOI:
- 10.18653/v1/2023.findings-acl.438
- Cite (ACL):
- Tannon Kew and Rico Sennrich. 2023. Uncovering Hidden Consequences of Pre-training Objectives in Sequence-to-Sequence Models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7010–7022, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Uncovering Hidden Consequences of Pre-training Objectives in Sequence-to-Sequence Models (Kew & Sennrich, Findings 2023)
- PDF:
- https://preview.aclanthology.org/landing_page/2023.findings-acl.438.pdf