Yi-Jun Chen


2026

Indirect speech acts (ISAs) require pragmatic reasoning over context, as directive intent cannot be inferred from surface form alone. Prior text-based studies and existing multimodal benchmarks largely overlook this requirement, focusing instead on explicitly encoded context or perceptual recognition, and thus underexplore context-dependent pragmatic understanding—particularly in high-context languages such as Korean. We introduce READI, a multimodal benchmark for evaluating ISA understanding through integrated reasoning over visual context and dialogue. READI models graded indirectness grounded in pragmatic theory and formulates the task as vision-based pragmatic question answering (V-PQA), supporting cross-lingual evaluation in English and Korean. Experiments show that even state-of-the-art multimodal models struggle with visually grounded indirect speech acts, with performance declining as indirectness increases, underscoring the need for benchmarks that explicitly target contextual pragmatic reasoning.

2005

2001