Kefan Yu

2026

CANVAS: A Multimodal Dataset of Chinese Textbook Images for Bias and Representation Analysis
Haotian Zhu | Kefan Yu | Min Li
Proceedings of the Fifteenth Language Resources and Evaluation Conference

Social biases in educational materials can subtly shape students’ perceptions of social roles and participation. However, most existing bias benchmarks for Chinese language models focus on text or isolated images, overlooking the multimodal scenes commonly found in educational textbooks. To address this gap, we introduce CANVAS (Chinese ANnotated Visual And Social scenes), a multimodal dataset constructed from Chinese elementary science textbooks and annotated across multiple social dimensions. CANVAS provides fine-grained labels for each depicted character’s demographics, social roles, interactions, and power-related attributes within visual scenes. The dataset is created using a semi-automated pipeline in which a vision–language model generates preliminary structured annotations that are subsequently verified and refined by human annotators. The current release focuses on the Grade 6 science subset and serves as an initial annotated version of the dataset. Using this subset, we present an illustrative case study demonstrating how scene-level and interactional annotations in CANVAS can be used to analyze gender representation in textbook images. By extending bias analysis to full educational scenes, CANVAS provides a new resource for studying representation and fairness in multimodal educational materials and supports future research in NLP, computer vision, and education.

pdf bib abs

The Pragmatic Mind of Machines: Tracing the Emergence of Pragmatic Competence in Large Language Models
Kefan Yu | Qingcheng Zeng | Weihao Xuan | Wanxin Li | Jingyi Wu | Rob Voigt
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Current large language models (LLMs) have demonstrated emerging capabilities in social intelligence tasks, including implicature resolution and theory-of-mind reasoning, both of which require substantial pragmatic understanding. However, how LLMs acquire this pragmatic competence throughout the training process remains poorly understood. In this work, we introduce ALTPRAG, a dataset grounded in the pragmatic concept of alternatives, to evaluate whether LLMs at different training stages can accurately infer nuanced speaker intentions. Each instance pairs two equally plausible yet pragmatically divergent continuations and requires the model to (i) infer the speaker’s intended meaning and (ii) explain when and why a speaker would choose one utterance over its alternative, thus directly probing pragmatic competence through contrastive reasoning. We systematically evaluate 22 LLMs across three key training stages: after pre-training, supervised fine-tuning (SFT), and preference optimization, to examine the development of pragmatic competence. Our results show that even base models exhibit notable sensitivity to pragmatic cues, which improves consistently with increases in model and data scale. Additionally, SFT and RLHF contribute further gains, particularly in cognitive-pragmatic scenarios. These findings highlight pragmatic competence as an emergent and compositional property of LLM training and offer new insights for aligning models with human communicative norms.

Co-authors

Qingcheng Zeng 1

Haotian Zhu 1

Venues

EACL1
LREC1

Fix author