Jonathan Tow


2023

pdf
trlX: A Framework for Large Scale Reinforcement Learning from Human Feedback
Alexander Havrilla | Maksym Zhuravinskyi | Duy Phung | Aman Tiwari | Jonathan Tow | Stella Biderman | Quentin Anthony | Louis Castricato
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Reinforcement learning from human feedback (RLHF) utilizes human feedback to better align large language models with human preferences via online optimization against a learned reward model. Current RLHF paradigms rely on Proximal Policy Optimization (PPO), which quickly becomes a challenge to implement and scale up to large architectures. To address this difficulty we present the AutoRLHF library as a feature complete open-source framework for RLHF fine-tuning of models up to and exceeding 70 billion parameters. To do so we implement support for multiple types of distributed training including distributed data parallel, model sharded, as well as tensor, sequential, and pipeline parallelism. Additionally, we implement compute and memory saving features, giving AutoRLHF the flexibility to support users with a wide range of compute resources. This includes offline RL methods like Implicit Language Q Learning (ILQL) as a compute efficient alternative to PPO. We find offline fine-tuning offers competitive performance relative to online algorithms while being easier to implement, train, and scale. To evaluate our framework we train RLHF models on two separate well-known tasks using publicly available human preference data. Models trained with AutoRLHF achieve preference win-rates over baselines at rates comparable to the original works.

2022

pdf
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sidney Black | Stella Biderman | Eric Hallahan | Quentin Anthony | Leo Gao | Laurence Golding | Horace He | Connor Leahy | Kyle McDonell | Jason Phang | Michael Pieler | Usvsn Sai Prashanth | Shivanshu Purohit | Laria Reynolds | Jonathan Tow | Ben Wang | Samuel Weinbach
Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models

We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license. It is, to the best of our knowledge, the largest dense autoregressive model that has publicly available weights at the time of submission. In this work, we describe GPT-NeoX-20B’s architecture and training, and evaluate its performance. We open-source the training and evaluation code, as well as the model weights, at https://github.com/EleutherAI/gpt-neox.