Abstract
Diffusion models have recently shown great potential on many generative tasks.In this work, we explore diffusion models for machine translation (MT).We adapt two prominent diffusion-based text generation models, Diffusion-LM and DiffuSeq, to perform machine translation.As the diffusion models generate non-autoregressively (NAR),we draw parallels to NAR machine translation models.With a comparison to conventional Transformer-based translation models, as well as to the Levenshtein Transformer,an established NAR MT model,we show that the multimodality problem that limits NAR machine translation performance is also a challenge to diffusion models.We demonstrate that knowledge distillation from an autoregressive model improves the performance of diffusion-based MT.A thorough analysis on the translation quality of inputs of different lengths shows that the diffusion models struggle more on long-range dependencies than other models.- Anthology ID:
- 2024.eacl-srw.25
- Volume:
- Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
- Month:
- March
- Year:
- 2024
- Address:
- St. Julian’s, Malta
- Editors:
- Neele Falk, Sara Papi, Mike Zhang
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 313–324
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/2024.eacl-srw.25/
- DOI:
- Cite (ACL):
- Yunus Demirag, Danni Liu, and Jan Niehues. 2024. Benchmarking Diffusion Models for Machine Translation. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 313–324, St. Julian’s, Malta. Association for Computational Linguistics.
- Cite (Informal):
- Benchmarking Diffusion Models for Machine Translation (Demirag et al., EACL 2024)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/2024.eacl-srw.25.pdf