@inproceedings{souayed-etal-2025-template,
title = "Template-Based Text-to-Image Alignment for Language Accessibility A Study on Visualizing Text Simplifications",
author = "Souayed, Belkiss and
Ebling, Sarah and
Gao, Yingqiang",
editor = "Shardlow, Matthew and
Alva-Manchego, Fernando and
North, Kai and
Stodden, Regina and
Saggion, Horacio and
Khallaf, Nouran and
Hayakawa, Akio",
booktitle = "Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://preview.aclanthology.org/ingest-emnlp/2025.tsar-1.1/",
pages = "1--18",
ISBN = "979-8-89176-176-6",
abstract = "Individuals with intellectual disabilities often have difficulties in comprehending complex texts. While many text-to-image models prioritize photorealism over cognitive accessibility it is not clear how visual illustrations relate to text simplifications TS generated from them. This paper presents a structured vision language model VLM prompting framework for generating cognitively accessible images from simplified texts. We designed five prompt templates i.e. Basic Object Focus Contextual Scene Educational Layout Multi-Level Detail and Grid Layout each following distinct spatial arrangements while adhering to accessibility constraints such as object count limits spatial separation and content restrictions. Using 400 sentence-level TS pairs from four established text simplification datasets OneStopEnglish SimPA Wikipedia ASSET we conducted a two-phase evaluation Phase 1 assessed template effectiveness with CLIP similarity scores and Phase 2 involved expert annotation of generated images across ten visual styles by four accessibility specialists. Results show that the Basic Object Focus template achieved the highest semantic alignment indicating that visual minimalism enhances accessibility. Expert evaluation further identified Retro style as the most accessible and Wikipedia as the most effective text source. Inter-annotator agreement varied across dimensions with Text Simplicity showing strong reliability and Image Quality proving more subjective. Overall our framework offers practical guidelines for accessible content creation and underscores the importance of structured prompting in AI-generated visual accessibility tools."
}Markdown (Informal)
[Template-Based Text-to-Image Alignment for Language Accessibility A Study on Visualizing Text Simplifications](https://preview.aclanthology.org/ingest-emnlp/2025.tsar-1.1/) (Souayed et al., TSAR 2025)
ACL