Abstract
Past works on multimodal machine translation (MMT) elevate bilingual setup by incorporating additional aligned vision information.However, an image-must requirement of the multimodal dataset largely hinders MMT’s development — namely that it demands an aligned form of [image, source text, target text].This limitation is generally troublesome during the inference phase especially when the aligned image is not provided as in the normal NMT setup.Thus, in this work, we introduce IKD-MMT, a novel MMT framework to support the image-free inference phase via an inversion knowledge distillation scheme.In particular, a multimodal feature generator is executed with a knowledge distillation module, which directly generates the multimodal feature from (only) source texts as the input.While there have been a few prior works entertaining the possibility to support image-free inference for machine translation, their performances have yet to rival the image-must translation.In our experiments, we identify our method as the first image-free approach to comprehensively rival or even surpass (almost) all image-must frameworks, and achieved the state-of-the-art result on the often-used Multi30k benchmark. Our code and data are availableat: https://github.com/pengr/IKD-mmt/tree/master..- Anthology ID:
- 2022.emnlp-main.152
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2379–2390
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-main.152
- DOI:
- 10.18653/v1/2022.emnlp-main.152
- Cite (ACL):
- Ru Peng, Yawen Zeng, and Jake Zhao. 2022. Distill The Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2379–2390, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- Distill The Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation (Peng et al., EMNLP 2022)
- PDF:
- https://preview.aclanthology.org/add_acl24_videos/2022.emnlp-main.152.pdf