Abstract
State-of-the-art language models like T5 have revolutionized the NLP landscape, but their computational demands hinder a large portion of the research community. To address this challenge, we present nanoT5, a specially-optimized PyTorch framework for efficient pre-training and fine-tuning of T5 models. Drawing on insights from optimizer differences and prioritizing efficiency, nanoT5 allows a T5-Base model to be pre-trained on a single GPU in just 16 hours, without any loss in performance. With the introduction of this open-source framework, we hope to widen the accessibility to language modelling research and cater to the community’s demand for more user-friendly T5 (Encoder-Decoder) implementations. We make our contributions, including configurations, codebase, pre-training insights, and pre-trained models, available to the public.- Anthology ID:
- 2023.nlposs-1.11
- Volume:
- Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Liling Tan, Dmitrijs Milajevs, Geeticka Chauhan, Jeremy Gwinnup, Elijah Rippeth
- Venues:
- NLPOSS | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 95–101
- Language:
- URL:
- https://aclanthology.org/2023.nlposs-1.11
- DOI:
- 10.18653/v1/2023.nlposs-1.11
- Cite (ACL):
- Piotr Nawrot. 2023. nanoT5: Fast & Simple Pre-training and Fine-tuning of T5 Models with Limited Resources. In Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), pages 95–101, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- nanoT5: Fast & Simple Pre-training and Fine-tuning of T5 Models with Limited Resources (Nawrot, NLPOSS-WS 2023)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/2023.nlposs-1.11.pdf