Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?

Jiwan Chung, Janghan Yoon, Junhyeong Park, Sangeyl Lee, Joowon Yang, Sooyeon Park, Youngjae Yu


Abstract
Any-to-any generative models aim to enable seamless interpretation and generation across multiple modalities within a unified framework, yet their ability to preserve relationships across modalities remains uncertain. Do unified models truly achieve cross-modal coherence, or is this coherence merely perceived? To explore this, we introduce ACON, a dataset of 1,000 images (500 newly contributed) paired with captions, editing instructions, and Q&A pairs to evaluate cross-modal transfers rigorously. Using three consistency criteria—cyclic consistency, forward equivariance, and conjugated equivariance—our experiments reveal that any-to-any models do not consistently demonstrate greater cross-modal consistency than specialized models in pointwise evaluations such as cyclic consistency. However, equivariance evaluations uncover weak but observable consistency through structured analyses of the intermediate latent space enabled by multiple editing operations. We release our code and data at https://github.com/JiwanChung/ACON.
Anthology ID:
2025.acl-long.130
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2595–2606
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.130/
DOI:
Bibkey:
Cite (ACL):
Jiwan Chung, Janghan Yoon, Junhyeong Park, Sangeyl Lee, Joowon Yang, Sooyeon Park, and Youngjae Yu. 2025. Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2595–2606, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists? (Chung et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.130.pdf