Jimin Jung


2026

The global deployment of Large Language Models (LLMs) underscores the urgent need to evaluate their cultural alignment. However, assessing genuine "cultural awareness" across modalities (text, vision, speech) and languages remains a significant challenge. To comprehensively investigate this domain, we propose MMAC, a systematic framework that encompasses a tri-modally aligned cultural benchmark creation pipeline and a five-dimensional evaluation protocol to assess cross-country awareness disparities, evaluate cross-lingual and cross-modal consistency, and verify cultural knowledge generalization and grounding validity. Given the prevailing Western cultural bias in current models, we focus on 8 Asian countries as our dataset foundation to more acutely reveal potential cultural deficiencies in LLMs. Our dataset, MMAC-bench, features 27,000 human-curated questions across 10 languages. Crucially, it is the first dataset aligned at the input level across text, image, and speech, enabling direct cross-modal transfer tests. Each question consists of multiple-choice options accompanied by open-ended generated explanations, where 79% require multi-step reasoning grounded in cultural context, moving beyond simple memorization. We probe the causes of modal divergence, offering insights into fostering culturally robust MLLMs.
The Plain Writing Act in the United States requires government documents to be written in clear and simple language. However, existing summarization systems struggle to address diverse linguistic and cognitive barriers among general readers. We propose NRLB (No Reader Left Behind), a unified multi-agent framework for plain language summarization that simulates three representative reader groups: elementary school students, non-native speakers, and readers with attention deficits. NRLB integrates template-based planning with an iterative feedback loop guided by simulated readers and domain expert revision to address comprehension barriers such as unknown terms, missing contexts, and confusing sentences. Evaluations across multiple datasets demonstrate consistent improvements in both readability and factuality. Human evaluation further supports these findings, with annotator preference rates ranging from 55% to 76%, highlighting NRLB’s ability to generate summaries that are both faithful to the source and accessible to a wide range of readers.