Learning Is Not A Race: Improving Retrieval in Language Models via Equal Learning

Wanqian Yang, Aahlad Manas Puli, Rajesh Ranganath


Abstract
Many applications that modern large language models (LLMs) are deployed on are retrieval tasks: the answer can be recovered from context and success is a matter of learning generalizable features from data. However, this is easier said than done. Overparametrized models trained on cross-entropy loss can overfit on noise. We argue that such overfitting is prone to happen when the model can identify mechanisms that rapidly drive down the loss of certain tokens early on in training. Fitting some tokens early reduce gradient signals in later iterations, as such, remaining tokens are more vulnerable to noise overfitting. We dub this phenomenon unequal learning and show that LLMs with longer contexts or larger embedding sizes are prone to this failure mode. In this work, we argue that learning training samples at an equal rate helps counter such biases. We highlight two mechanisms that promote equal learning: (i) loss functions that regularize uniform margins across training samples, (ii) small learning rates (e.g. by warming up) at the start of training. We demonstrate these approaches on various synthetic and natural language datasets.
Anthology ID:
2025.findings-emnlp.1260
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
23200–23211
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1260/
DOI:
10.18653/v1/2025.findings-emnlp.1260
Bibkey:
Cite (ACL):
Wanqian Yang, Aahlad Manas Puli, and Rajesh Ranganath. 2025. Learning Is Not A Race: Improving Retrieval in Language Models via Equal Learning. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 23200–23211, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Learning Is Not A Race: Improving Retrieval in Language Models via Equal Learning (Yang et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1260.pdf
Checklist:
 2025.findings-emnlp.1260.checklist.pdf