We present a novel method for document-level text simplification and automatic illustration generation aimed at enhancing information accessibility for individuals with cognitive impairments. While prior research has primarily focused on sentence- or paragraph-level simplification and text-to-image generation for narrative contexts this work addresses the unique challenges of simplifying long-form documents and generating semantically aligned visuals. The pipeline consists of three stages (1) discourse-aware segmentation using large language models (2) visually grounded description generation via abstraction and (3) controlled image synthesis using state-of-the-art diffusion models including DALLE 3 and FLUX1-dev. We further incorporate stylistic constraints to ensure visual coherence and we conduct a human evaluation measuring comprehension semantic alignment and visual clarity. Experimental results demonstrate that our method effectively combines simplified text and visual content with generated illustrations enhancing textual accessibility.
Large Language Models (LLMs) demonstrate significant value in domain-specific applications, benefiting from their fundamental capabilities. Nevertheless, it is still unclear which fundamental capabilities contribute to success in specific domains. Moreover, the existing benchmark-based evaluation cannot effectively reflect the performance of real-world applications. In this survey, we review recent advances of LLMs in domain applications, aiming to summarize the fundamental capabilities and their collaboration. Furthermore, we establish connections between fundamental capabilities and specific domains, evaluating the varying importance of different capabilities. Based on our findings, we propose a reliable strategy for domains to choose more robust backbone LLMs for real-world applications.
Knowledge-grounded dialogue is a task of gener- ating an informative response based on both the dialogue history and external knowledge source. In general, there are two forms of knowledge: manu- ally annotated knowledge graphs and knowledge text from website. From various evaluation viewpoints, each type of knowledge has advantages and downsides. To further distinguish the principles and determinants from the intricate factors, we conduct a thorough experiment and study on the task to answer three essential questions. The ques- tions involve the choice of appropriate knowledge form, the degree of mutual effects between knowl- edge and the model selection, and the few-shot performance of knowledge. Supported by statistical shreds of evidence, we offer conclusive solutions and sensible suggestions for directions and standards of future research.