Language-agnostic Semantic Consistent Text-to-Image Generation

SeongJun Jung; Woo Suk Choi; Seongho Choi; Byoung-Tak Zhang

doi:10.18653/v1/2022.mml-1.1

Language-agnostic Semantic Consistent Text-to-Image Generation

SeongJun Jung, Woo Suk Choi, Seongho Choi, Byoung-Tak Zhang

Abstract

Recent GAN-based text-to-image generation models have advanced that they can generate photo-realistic images matching semantically with descriptions. However, research on multi-lingual text-to-image generation has not been carried out yet much. There are two problems when constructing a multilingual text-to-image generation model: 1) language imbalance issue in text-to-image paired datasets and 2) generating images that have the same meaning but are semantically inconsistent with each other in texts expressed in different languages. To this end, we propose a Language-agnostic Semantic Consistent Generative Adversarial Network (LaSC-GAN) for text-to-image generation, which can generate semantically consistent images via language-agnostic text encoder and Siamese mechanism. Experiments on relatively low-resource language text-image datasets show that the model has comparable generation quality as images generated by high-resource language text, and generates semantically consistent images for texts with the same meaning even in different languages.

Anthology ID:: 2022.mml-1.1
Volume:: Proceedings of the Workshop on Multilingual Multimodal Learning
Month:: May
Year:: 2022
Address:: Dublin, Ireland and Online
Editors:: Emanuele Bugliarello, Kai-Wei Cheng, Desmond Elliott, Spandana Gella, Aishwarya Kamath, Liunian Harold Li, Fangyu Liu, Jonas Pfeiffer, Edoardo Maria Ponti, Krishna Srinivasan, Ivan Vulić, Yinfei Yang, Da Yin
Venue:: MML
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–5
Language:
URL:: https://aclanthology.org/2022.mml-1.1
DOI:: 10.18653/v1/2022.mml-1.1
Bibkey:
Cite (ACL):: SeongJun Jung, Woo Suk Choi, Seongho Choi, and Byoung-Tak Zhang. 2022. Language-agnostic Semantic Consistent Text-to-Image Generation. In Proceedings of the Workshop on Multilingual Multimodal Learning, pages 1–5, Dublin, Ireland and Online. Association for Computational Linguistics.
Cite (Informal):: Language-agnostic Semantic Consistent Text-to-Image Generation (Jung et al., MML 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-2/2022.mml-1.1.pdf
Video:: https://preview.aclanthology.org/nschneid-patch-2/2022.mml-1.1.mp4
Data: COCO-CN, MS COCO

PDF Cite Search Video