Self-adaptive Dataset Construction for Real-World Multimodal Safety Scenarios

Jingen Qu, Lijun Li, Bo Zhang, Yichen Yan, Jing Shao


Abstract
Multimodal large language models (MLLMs) are rapidly evolving, presenting increasingly complex safety challenges. However, current dataset construction methods, which are risk-oriented, fail to cover the growing complexity of real-world multimodal safety scenarios (RMS). And due to the lack of a unified evaluation metric, their overall effectiveness remains unproven. This paper introduces a novel image-oriented self-adaptive dataset construction method for RMS, which starts with images and end constructing paired text and guidance responses. Using the image-oriented method, we automatically generate an RMS dataset comprising 35,610 image–text pairs with guidance responses. Additionally, we introduce a standardized safety dataset evaluation metric: fine-tuning a safety judge model and evaluating its capabilities on other safety datasets. Extensive experiments on various tasks demonstrate the effectiveness of the proposed image-oriented pipeline. The results confirm the scalability and effectiveness of the image-oriented approach, offering a new perspective for the construction of real-world multimodal safety datasets.
Anthology ID:
2025.findings-emnlp.912
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16805–16829
Language:
URL:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.912/
DOI:
10.18653/v1/2025.findings-emnlp.912
Bibkey:
Cite (ACL):
Jingen Qu, Lijun Li, Bo Zhang, Yichen Yan, and Jing Shao. 2025. Self-adaptive Dataset Construction for Real-World Multimodal Safety Scenarios. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 16805–16829, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Self-adaptive Dataset Construction for Real-World Multimodal Safety Scenarios (Qu et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.912.pdf
Checklist:
 2025.findings-emnlp.912.checklist.pdf