Jiangshan Guan

2026

BloomEval: A Bloom’s Cognitive Taxonomy-Based Benchmark for Evaluating LRMs via Cognitive Hierarchy Trace
Zhiyi Duan | Lei Gao | Jiangshan Guan | Qi Wang | Rui Liu
Findings of the Association for Computational Linguistics: ACL 2026

Current benchmarks for Large Reasoning Models (LRMs) primarily rely on answer correctness, failing to assess the structural coherence and cognitive soundness of the reasoning process itself. To address this gap, we introduce Cognitive Hierarchy Trace (CHT), a novel evaluation framework grounded in Bloom’s Cognitive Taxonomy (BCT). CHT provides a structured, step-wise mapping of a model’s reasoning trajectory onto hierarchical cognitive levels, enabling the detection of structural anomalies such as hierarchy jumps, breaks, and overthinking. Based on CHT, we present BloomEval, the first large-scale benchmark designed for fine-grained cognitive capability assessment. It comprises 94,602 math problems, each annotated with Bloom’s cognitive levels, CHT trajectories, a three-tier knowledge hierarchy, and problem difficulty. To ensure scalable yet reliable annotation, we develop an Expert-LLM collaborative pipeline with a three-stage reconciliation mechanism. Our comprehensive evaluation reveals a critical finding: models often arrive at correct answers through cognitively flawed or opaque reasoning paths. The CHT-based analysis uncovers prevalent structural inconsistencies that are invisible to outcome-only metrics, demonstrating that answer accuracy is an insufficient proxy for reasoning quality.

pdf bib abs

From Coarse to Fine: A Multi-Granularity Multimodal Framework for Teacher Sentiment Analysis
Zhiyi Duan | Xiangren Wang | Jiangshan Guan | Bing Jia | Qianli Xing
Findings of the Association for Computational Linguistics: ACL 2026

Teacher sentiment analysis is pivotal for understanding instructional dynamics, yet it remains challenging because classroom expressions are professionally regulated performances rather than spontaneous outbursts. However, existing approaches typically treat sentiment as a static, monolithic label, failing to capture this structured heterogeneity. To effectively model this complexity, we decompose teacher sentiment into three granularities: coarse-level performativity, medium-level intra-class heterogeneity, and fine-level cross-modal complementarity. Guided by this perspective, we propose CF-TSA, a coarse-to-fine multimodal framework. Specifically, we employ CLS-guided cross-modal attention to recover effective signals from regulated displays (coarse-level), thresholded substyle discovery to identify latent pedagogical styles (medium-level), and substyle-aware contrastive learning to align dynamic multimodal cue compositions (fine-level). Experiments on T-MED and CMU-MOSEI demonstrate that CF-TSA consistently outperforms state-of-the-art baselines, validating the effectiveness of the coarse-to-fine perspective and the hierarchical modeling.

Co-authors

Xiangren Wang 1

Qianli Xing 1

Venues

Findings2

Fix author