Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images
Elisei Rykov, Kseniia Petrushina, Kseniia Titova, Anton Razzhigaev, Alexander Panchenko, Vasily Konovalov
Abstract
Measuring how real images look is a complex task in artificial intelligence research. For example, an image of Albert Einstein holding a smartphone violates common-sense because modern smartphone were invented after Einstein’s death. We introduce a novel method, which we called Through the Looking Glass (TLG), to assess image common sense consistency using Large Vision-Language Models (LVLMs) and Transformer-based encoder. By leveraging LVLM to extract atomic facts from these images, we obtain a mix of accurate facts. We proceed by fine-tuning a compact attention-pooling classifier over encoded atomic facts. Our TLG has achieved a new state-of-the-art performance on the WHOOPS! and WEIRD datasets while leveraging a compact fine-tuning component.- Anthology ID:
- 2025.naacl-srw.28
- Volume:
- Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
- Month:
- April
- Year:
- 2025
- Address:
- Albuquerque, USA
- Editors:
- Abteen Ebrahimi, Samar Haider, Emmy Liu, Sammar Haider, Maria Leonor Pacheco, Shira Wein
- Venues:
- NAACL | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 279–293
- Language:
- URL:
- https://preview.aclanthology.org/corrections-2025-06/2025.naacl-srw.28/
- DOI:
- 10.18653/v1/2025.naacl-srw.28
- Cite (ACL):
- Elisei Rykov, Kseniia Petrushina, Kseniia Titova, Anton Razzhigaev, Alexander Panchenko, and Vasily Konovalov. 2025. Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 279–293, Albuquerque, USA. Association for Computational Linguistics.
- Cite (Informal):
- Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images (Rykov et al., NAACL 2025)
- PDF:
- https://preview.aclanthology.org/corrections-2025-06/2025.naacl-srw.28.pdf