Yifan Pu


2025

pdf bib
Beyond Single Frames: Can LMMs Comprehend Implicit Narratives in Comic Strip?
Xiaochen Wang | Heming Xia | Jialin Song | Longyu Guan | Qingxiu Dong | Rui Li | Yixin Yang | Yifan Pu | Weiyao Luo | Yiru Wang | Xiangdi Meng | Wenjie Li | Zhifang Sui
Findings of the Association for Computational Linguistics: EMNLP 2025

Large Multimodal Models (LMMs) have demonstrated strong performance on vision-language benchmarks, yet current evaluations predominantly focus on single-image reasoning. In contrast, real-world scenarios always involve understanding sequences of images. A typical scenario is comic strips understanding, which requires models to perform nuanced visual reasoning beyond surface-level recognition. To address this gap, we introduce STRIPCIPHER , a benchmark designed to evaluate the model ability on understanding implicit narratives in silent comics. STRIPCIPHER is a high-quality, human-annotated dataset featuring fine-grained annotations and comprehensive coverage of varying difficulty levels. It comprises three tasks: visual narrative comprehension, contextual frame prediction, and temporal narrative reordering. % , covering various difficulty. Notably, evaluation results on STRIPCIPHER reveals a significant gap between current LMMs and human performance—e.g., GPT-4o achieves only 23.93% accuracy in the reordering task, 56.07% below human levels. These findings underscore the limitations of current LMMs in implicit visual narrative understanding and highlight opportunities for advancing sequential multimodal reasoning.

2024

pdf bib
PsychoGAT: A Novel Psychological Measurement Paradigm through Interactive Fiction Games with LLM Agents
Qisen Yang | Zekun Wang | Honghui Chen | Shenzhi Wang | Yifan Pu | Xin Gao | Wenhao Huang | Shiji Song | Gao Huang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Psychological measurement is essential for mental health, self-understanding, and personal development. Traditional methods, such as self-report scales and psychologist interviews, often face challenges with engagement and accessibility. While game-based and LLM-based tools have been explored to improve user interest and automate assessment, they struggle to balance engagement with generalizability. In this work, we propose PsychoGAT (Psychological Game AgenTs) to achieve a generic gamification of psychological assessment. The main insight is that powerful LLMs can function both as adept psychologists and innovative game designers. By incorporating LLM agents into designated roles and carefully managing their interactions, PsychoGAT can transform any standardized scales into personalized and engaging interactive fiction games. To validate the proposed method, we conduct psychometric evaluations to assess its effectiveness and employ human evaluators to examine the generated content across various psychological constructs, including depression, cognitive distortions, and personality traits. Results demonstrate that PsychoGAT serves as an effective assessment tool, achieving statistically significant excellence in psychometric metrics such as reliability, convergent validity, and discriminant validity. Moreover, human evaluations confirm PsychoGAT’s enhancements in content coherence, interactivity, interest, immersion, and satisfaction.