Jongwon Ryu
2026
Language-Grounded Multi-Domain Image Translation via Semantic Difference Guidance
Jongwon Ryu | Joonhyung Park | Jaeho Han | Yeong-Seok Kim | Hye-Rin Kim | Sunjae Yoon | Junyeong Kim
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Jongwon Ryu | Joonhyung Park | Jaeho Han | Yeong-Seok Kim | Hye-Rin Kim | Sunjae Yoon | Junyeong Kim
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Multi-domain image-to-image translation requires grounding semantic differences expressed in natural language prompts into corresponding visual transformations, while preserving unrelated structural and semantic content. Existing methods struggle to maintain structural integrity and provide fine-grained, attribute-specific control, especially when multiple domains are involved. We propose LACE (Language-grounded Attribute-Controllable Translation), built on two components: (1) a GLIP-Adapter that fuses global semantics with local structural features to preserve consistency, and (2) a Multi-Domain Control Guidance mechanism that explicitly grounds the semantic delta between source and target prompts into per-attribute translation vectors, aligning linguistic semantics with domain-level visual changes. Together, these modules enable compositional multi-domain control with independent strength modulation for each attribute. Experiments on CelebA(Dialog) and BDD100K demonstrate that LACE achieves high visual fidelity, structural preservation, and interpretable domain-specific control, surpassing prior baselines. This positions LACE as a cross-modal content generation framework bridging language semantics and controllable visual translation. Code will be publicly available.