MTAdam: Automatic Balancing of Multiple Training Loss Terms

Itzik Malkiel, Lior Wolf


Abstract
When training neural models, it is common to combine multiple loss terms. The balancing of these terms requires considerable human effort and is computationally demanding. Moreover, the optimal trade-off between the loss terms can change as training progresses, e.g., for adversarial terms. In this work, we generalize the Adam optimization algorithm to handle multiple loss terms. The guiding principle is that for every layer, the gradient magnitude of the terms should be balanced. To this end, the Multi-Term Adam (MTAdam) computes the derivative of each loss term separately, infers the first and second moments per parameter and loss term, and calculates a first moment for the magnitude per layer of the gradients arising from each loss. This magnitude is used to continuously balance the gradients across all layers, in a manner that both varies from one layer to the next and dynamically changes over time. Our results show that training with the new method leads to fast recovery from suboptimal initial loss weighting and to training outcomes that match or improve conventional training with the prescribed hyperparameters of each method.
Anthology ID:
2021.emnlp-main.837
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10713–10729
Language:
URL:
https://aclanthology.org/2021.emnlp-main.837
DOI:
10.18653/v1/2021.emnlp-main.837
Bibkey:
Cite (ACL):
Itzik Malkiel and Lior Wolf. 2021. MTAdam: Automatic Balancing of Multiple Training Loss Terms. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10713–10729, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
MTAdam: Automatic Balancing of Multiple Training Loss Terms (Malkiel & Wolf, EMNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2021.emnlp-main.837.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-4/2021.emnlp-main.837.mp4
Code
 ItzikMalkiel/MTAdam
Data
BSDSet14