Investigating the translation capabilities of Large Language Models trained on parallel data only

Javier García Gilabert; Carlos Escolano; Aleix Sant; Francesca De Luca Fornaciari; Audrey Mash; Xixian Liao; Maite Melero

Investigating the translation capabilities of Large Language Models trained on parallel data only

Javier García Gilabert, Carlos Escolano, Aleix Sant, Francesca De Luca Fornaciari, Audrey Mash, Xixian Liao, Maite Melero

Abstract

In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methods predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (Parallel Language Model), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparably to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones. Utilizing this set of models, we conduct a thorough investigation into the translation capabilities of LLMs, probing their performance, the role of vocabulary size, the impact of the different elements of the prompt, and their cross-lingual representation space. We find that larger vocabulary sizes improve zero-shot performance and that different layers specialize in distinct aspects of the prompt, such as language-specific tags. We further show that as the vocabulary size grows, a larger number of attention heads can be pruned with minimal loss in translation quality, achieving a reduction of over 64.7% in attention heads.

Anthology ID:: 2025.mtsummit-1.4
Volume:: Proceedings of Machine Translation Summit XX: Volume 1
Month:: June
Year:: 2025
Address:: Geneva, Switzerland
Editors:: Pierrette Bouillon, Johanna Gerlach, Sabrina Girletti, Lise Volkart, Raphael Rubino, Rico Sennrich, Ana C. Farinha, Marco Gaido, Joke Daems, Dorothy Kenny, Helena Moniz, Sara Szoc
Venue:: MTSummit
SIG:
Publisher:: European Association for Machine Translation
Note:
Pages:: 24–53
Language:
URL:: https://preview.aclanthology.org/mtsummit-25-ingestion/2025.mtsummit-1.4/
DOI:
Bibkey:
Cite (ACL):: Javier García Gilabert, Carlos Escolano, Aleix Sant, Francesca De Luca Fornaciari, Audrey Mash, Xixian Liao, and Maite Melero. 2025. Investigating the translation capabilities of Large Language Models trained on parallel data only. In Proceedings of Machine Translation Summit XX: Volume 1, pages 24–53, Geneva, Switzerland. European Association for Machine Translation.
Cite (Informal):: Investigating the translation capabilities of Large Language Models trained on parallel data only (Gilabert et al., MTSummit 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/mtsummit-25-ingestion/2025.mtsummit-1.4.pdf

PDF Cite Search Fix data