A Margin-based Loss with Synthetic Negative Samples for Continuous-output Machine Translation

Gayatri Bhat, Sachin Kumar, Yulia Tsvetkov


Abstract
Neural models that eliminate the softmax bottleneck by generating word embeddings (rather than multinomial distributions over a vocabulary) attain faster training with fewer learnable parameters. These models are currently trained by maximizing densities of pretrained target embeddings under von Mises-Fisher distributions parameterized by corresponding model-predicted embeddings. This work explores the utility of margin-based loss functions in optimizing such models. We present syn-margin loss, a novel margin-based loss that uses a synthetic negative sample constructed from only the predicted and target embeddings at every step. The loss is efficient to compute, and we use a geometric analysis to argue that it is more consistent and interpretable than other margin-based losses. Empirically, we find that syn-margin provides small but significant improvements over both vMF and standard margin-based losses in continuous-output neural machine translation.
Anthology ID:
D19-5621
Volume:
Proceedings of the 3rd Workshop on Neural Generation and Translation
Month:
November
Year:
2019
Address:
Hong Kong
Venue:
NGT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
199–205
Language:
URL:
https://aclanthology.org/D19-5621
DOI:
10.18653/v1/D19-5621
Bibkey:
Cite (ACL):
Gayatri Bhat, Sachin Kumar, and Yulia Tsvetkov. 2019. A Margin-based Loss with Synthetic Negative Samples for Continuous-output Machine Translation. In Proceedings of the 3rd Workshop on Neural Generation and Translation, pages 199–205, Hong Kong. Association for Computational Linguistics.
Cite (Informal):
A Margin-based Loss with Synthetic Negative Samples for Continuous-output Machine Translation (Bhat et al., NGT 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/D19-5621.pdf