Template-Based Text-to-Image Alignment for Language Accessibility A Study on Visualizing Text Simplifications

Belkiss Souayed, Sarah Ebling, Yingqiang Gao


Abstract
Individuals with intellectual disabilities often have difficulties in comprehending complex texts. While many text-to-image models prioritize photorealism over cognitive accessibility it is not clear how visual illustrations relate to text simplifications TS generated from them. This paper presents a structured vision language model VLM prompting framework for generating cognitively accessible images from simplified texts. We designed five prompt templates i.e. Basic Object Focus Contextual Scene Educational Layout Multi-Level Detail and Grid Layout each following distinct spatial arrangements while adhering to accessibility constraints such as object count limits spatial separation and content restrictions. Using 400 sentence-level TS pairs from four established text simplification datasets OneStopEnglish SimPA Wikipedia ASSET we conducted a two-phase evaluation Phase 1 assessed template effectiveness with CLIP similarity scores and Phase 2 involved expert annotation of generated images across ten visual styles by four accessibility specialists. Results show that the Basic Object Focus template achieved the highest semantic alignment indicating that visual minimalism enhances accessibility. Expert evaluation further identified Retro style as the most accessible and Wikipedia as the most effective text source. Inter-annotator agreement varied across dimensions with Text Simplicity showing strong reliability and Image Quality proving more subjective. Overall our framework offers practical guidelines for accessible content creation and underscores the importance of structured prompting in AI-generated visual accessibility tools.
Anthology ID:
2025.tsar-1.1
Volume:
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Matthew Shardlow, Fernando Alva-Manchego, Kai North, Regina Stodden, Horacio Saggion, Nouran Khallaf, Akio Hayakawa
Venues:
TSAR | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–18
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.tsar-1.1/
DOI:
Bibkey:
Cite (ACL):
Belkiss Souayed, Sarah Ebling, and Yingqiang Gao. 2025. Template-Based Text-to-Image Alignment for Language Accessibility A Study on Visualizing Text Simplifications. In Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025), pages 1–18, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Template-Based Text-to-Image Alignment for Language Accessibility A Study on Visualizing Text Simplifications (Souayed et al., TSAR 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.tsar-1.1.pdf