GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sidney Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, Usvsn Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
Abstract
We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license. It is, to the best of our knowledge, the largest dense autoregressive model that has publicly available weights at the time of submission. In this work, we describe GPT-NeoX-20B’s architecture and training, and evaluate its performance. We open-source the training and evaluation code, as well as the model weights, at https://github.com/EleutherAI/gpt-neox.- Anthology ID:
- 2022.bigscience-1.9
- Volume:
- Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models
- Month:
- May
- Year:
- 2022
- Address:
- virtual+Dublin
- Venue:
- BigScience
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 95–136
- Language:
- URL:
- https://aclanthology.org/2022.bigscience-1.9
- DOI:
- 10.18653/v1/2022.bigscience-1.9
- Cite (ACL):
- Sidney Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, Usvsn Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach. 2022. GPT-NeoX-20B: An Open-Source Autoregressive Language Model. In Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models, pages 95–136, virtual+Dublin. Association for Computational Linguistics.
- Cite (Informal):
- GPT-NeoX-20B: An Open-Source Autoregressive Language Model (Black et al., BigScience 2022)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2022.bigscience-1.9.pdf
- Code
- eleutherai/gpt-neox + additional community code
- Data
- ARC, HellaSwag, LAMBADA, LogiQA, MATH, MMLU, PIQA, PROST, SuperGLUE, The Pile