Mo Zhang
2025
Document-level Simplification and Illustration Generation Multimodal Coherence
Yuhang Liu
|
Mo Zhang
|
Zhaoyi Cheng
|
Sarah Ebling
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)
We present a novel method for document-level text simplification and automatic illustration generation aimed at enhancing information accessibility for individuals with cognitive impairments. While prior research has primarily focused on sentence- or paragraph-level simplification and text-to-image generation for narrative contexts this work addresses the unique challenges of simplifying long-form documents and generating semantically aligned visuals. The pipeline consists of three stages (1) discourse-aware segmentation using large language models (2) visually grounded description generation via abstraction and (3) controlled image synthesis using state-of-the-art diffusion models including DALLE 3 and FLUX1-dev. We further incorporate stylistic constraints to ensure visual coherence and we conduct a human evaluation measuring comprehension semantic alignment and visual clarity. Experimental results demonstrate that our method effectively combines simplified text and visual content with generated illustrations enhancing textual accessibility.