Scaling Laws of Decoder-Only Models on the Multilingual Machine Translation Task
Gaëtan Caillaut, Mariam Nakhlé, Raheel Qader, Jingshu Liu, Jean-Gabriel Barthélemy
Abstract
Recent studies have showcased remarkable capabilities of decoder-only models in many NLP tasks, including translation. Yet, the machine translation field has been largely dominated by encoder-decoder models based on the Transformer architecture. As a consequence, scaling laws of encoder-decoder models for neural machine translation have already been well studied, but decoder-only models have received less attention.This work explores the scaling laws of decoder-only models on the multilingual and multidomain translation task. We trained a collection of six decoder-only models, ranging from 70M to 7B parameters, on a sentence-level, multilingual (8 languages) and multidomain (9 domains) dataset. We conducted a series of experiments showing that the loss of decoder-only models can be estimated using a scaling law similar to the one discovered for large language models, but we also show that this scaling law has difficulties to generalize to too large models or to a different data distribution. We also study different scaling methods and show that scaling the depth and the width of a model lead to similar test loss improvements, but with different impact on the model’s efficiency.- Anthology ID:
- 2024.wmt-1.124
- Volume:
- Proceedings of the Ninth Conference on Machine Translation
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
- Venues:
- WMT | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1318–1331
- Language:
- URL:
- https://preview.aclanthology.org/moar-dois/2024.wmt-1.124/
- DOI:
- 10.18653/v1/2024.wmt-1.124
- Cite (ACL):
- Gaëtan Caillaut, Mariam Nakhlé, Raheel Qader, Jingshu Liu, and Jean-Gabriel Barthélemy. 2024. Scaling Laws of Decoder-Only Models on the Multilingual Machine Translation Task. In Proceedings of the Ninth Conference on Machine Translation, pages 1318–1331, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Scaling Laws of Decoder-Only Models on the Multilingual Machine Translation Task (Caillaut et al., WMT 2024)
- PDF:
- https://preview.aclanthology.org/moar-dois/2024.wmt-1.124.pdf