Stress-Testing Multimodal Foundation Models for Crystallographic Reasoning

Can Polat, Hasan Kurban, Erchin Serpedin, Mustafa Kurban


Abstract
Evaluating foundation models for crystallographic reasoning requires benchmarks that isolate generalization behavior while enforcing physical constraints. This work introduces, xCrysAlloys, a multiscale multicrystal dataset with two physically grounded evaluation protocols to stress-test multimodal generative models. The Spatial-Exclusion benchmark withholds all supercells of a given radius from a diverse dataset, enabling controlled assessments of spatial interpolation and extrapolation. The Compositional-Exclusion benchmark omits all samples of a specific chemical composition, probing generalization across stoichiometries. Nine vision–language foundation models are prompted with crystallographic images and textual context to generate structural annotations. Responses are evaluated via (i) relative errors in lattice parameters and density, (ii) a physics-consistency index penalizing volumetric violations, and (iii) a hallucination score capturing geometric outliers and invalid space-group predictions. These benchmarks establish a reproducible, physically informed framework for assessing generalization, consistency, and reliability in large-scale multimodal models. Dataset and implementation are available at https://github.com/KurbanIntelligenceLab/StressTestingMMFMinCR.
Anthology ID:
2025.knowllm-1.5
Volume:
Proceedings of the 3rd Workshop on Towards Knowledgeable Foundation Models (KnowFM)
Month:
August
Year:
2025
Address:
Vienna, Austria
Editors:
Yuji Zhang, Canyu Chen, Sha Li, Mor Geva, Chi Han, Xiaozhi Wang, Shangbin Feng, Silin Gao, Isabelle Augenstein, Mohit Bansal, Manling Li, Heng Ji
Venues:
KnowLLM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
49–58
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.knowllm-1.5/
DOI:
Bibkey:
Cite (ACL):
Can Polat, Hasan Kurban, Erchin Serpedin, and Mustafa Kurban. 2025. Stress-Testing Multimodal Foundation Models for Crystallographic Reasoning. In Proceedings of the 3rd Workshop on Towards Knowledgeable Foundation Models (KnowFM), pages 49–58, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Stress-Testing Multimodal Foundation Models for Crystallographic Reasoning (Polat et al., KnowLLM 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.knowllm-1.5.pdf