Multimodal Safety Evaluation in Generative Agent Social Simulations

Alhim Adonai Vera Gonzalez; Carlos Hinojosa; Karen Sanchez; Haidar Bin Hamid; Donghoon Kim; Bernard Ghanem

Multimodal Safety Evaluation in Generative Agent Social Simulations

Alhim Adonai Vera Gonzalez, Carlos Hinojosa, Karen Sanchez, Haidar Bin Hamid, Donghoon Kim, Bernard Ghanem

Abstract

Can generative agents be trusted in multimodal environments? Despite recent advances, agents remain limited in their ability to reason about safety, coherence, and trust across modalities. We introduce a reproducible simulation framework to evaluate generative agents in three aspects: (1) safety improvement over time via iterative plan revision in multimodal scenarios; (2) detection of unsafe activities across social contexts; and (3) social dynamics, measured through interaction and acceptance rates. These multimodal agents are evaluated using metrics that quantify plan revisions and unsafe-to-safe conversions. Experiments show that while agents detect direct multimodal contradictions, they often fail to align local revisions with global safety, achieving only a 55% success rate in correcting unsafe plans. We release a dataset of 1,000 multimodal plans, yielding more than 600,000 simulation steps. Notably, 45% of unsafe actions are accepted when paired with misleading visual cues, revealing a strong tendency to overtrust visual content. Code is available at https://github.com/AdonaiVera/X-CASE

Anthology ID:: 2026.acl-long.1915
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 41295–41310
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1915/
DOI:
Bibkey:
Cite (ACL):: Alhim Adonai Vera Gonzalez, Carlos Hinojosa, Karen Sanchez, Haidar Bin Hamid, Donghoon Kim, and Bernard Ghanem. 2026. Multimodal Safety Evaluation in Generative Agent Social Simulations. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 41295–41310, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Multimodal Safety Evaluation in Generative Agent Social Simulations (Gonzalez et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1915.pdf
Checklist:: 2026.acl-long.1915.checklist.pdf

PDF Cite Search Checklist Fix data