Abstract
Neural models that eliminate the softmax bottleneck by generating word embeddings (rather than multinomial distributions over a vocabulary) attain faster training with fewer learnable parameters. These models are currently trained by maximizing densities of pretrained target embeddings under von Mises-Fisher distributions parameterized by corresponding model-predicted embeddings. This work explores the utility of margin-based loss functions in optimizing such models. We present syn-margin loss, a novel margin-based loss that uses a synthetic negative sample constructed from only the predicted and target embeddings at every step. The loss is efficient to compute, and we use a geometric analysis to argue that it is more consistent and interpretable than other margin-based losses. Empirically, we find that syn-margin provides small but significant improvements over both vMF and standard margin-based losses in continuous-output neural machine translation.- Anthology ID:
- D19-5621
- Volume:
- Proceedings of the 3rd Workshop on Neural Generation and Translation
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong
- Venue:
- NGT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 199–205
- Language:
- URL:
- https://aclanthology.org/D19-5621
- DOI:
- 10.18653/v1/D19-5621
- Cite (ACL):
- Gayatri Bhat, Sachin Kumar, and Yulia Tsvetkov. 2019. A Margin-based Loss with Synthetic Negative Samples for Continuous-output Machine Translation. In Proceedings of the 3rd Workshop on Neural Generation and Translation, pages 199–205, Hong Kong. Association for Computational Linguistics.
- Cite (Informal):
- A Margin-based Loss with Synthetic Negative Samples for Continuous-output Machine Translation (Bhat et al., NGT 2019)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/D19-5621.pdf