Distill The Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation

Ru Peng, Yawen Zeng, Jake Zhao


Abstract
Past works on multimodal machine translation (MMT) elevate bilingual setup by incorporating additional aligned vision information.However, an image-must requirement of the multimodal dataset largely hinders MMT’s development — namely that it demands an aligned form of [image, source text, target text].This limitation is generally troublesome during the inference phase especially when the aligned image is not provided as in the normal NMT setup.Thus, in this work, we introduce IKD-MMT, a novel MMT framework to support the image-free inference phase via an inversion knowledge distillation scheme.In particular, a multimodal feature generator is executed with a knowledge distillation module, which directly generates the multimodal feature from (only) source texts as the input.While there have been a few prior works entertaining the possibility to support image-free inference for machine translation, their performances have yet to rival the image-must translation.In our experiments, we identify our method as the first image-free approach to comprehensively rival or even surpass (almost) all image-must frameworks, and achieved the state-of-the-art result on the often-used Multi30k benchmark. Our code and data are availableat: https://github.com/pengr/IKD-mmt/tree/master..
Anthology ID:
2022.emnlp-main.152
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2379–2390
Language:
URL:
https://aclanthology.org/2022.emnlp-main.152
DOI:
10.18653/v1/2022.emnlp-main.152
Bibkey:
Cite (ACL):
Ru Peng, Yawen Zeng, and Jake Zhao. 2022. Distill The Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2379–2390, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Distill The Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation (Peng et al., EMNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/add_acl24_videos/2022.emnlp-main.152.pdf