Yabing Shi
2026
PedagogyBench: A Cognitive-Driven Benchmark for Multimodal Instructional Video Understanding
Xiaokang Jin | Jia Zhu | Jingjiang Liu | Yabing Shi | Jueqi Guan | Hao Chen | Pasquale De Meo
Findings of the Association for Computational Linguistics: ACL 2026
Xiaokang Jin | Jia Zhu | Jingjiang Liu | Yabing Shi | Jueqi Guan | Hao Chen | Pasquale De Meo
Findings of the Association for Computational Linguistics: ACL 2026
Existing video understanding benchmarks mainly emphasize general visual recognition and reasoning, but do not adequately capture the pedagogical logic embedded in instructional videos. To address this gap, we present PedagogyBench, a multimodal benchmark for instructional video understanding grounded in pedagogical cognition. We introduce a pedagogy-driven segmentation strategy and a dual-stream semantic injection pipeline that combines machine pre-annotation with expert refinement, enabling the construction of a dataset organized around a cognitive pyramid with four levels and 20 fine-grained tasks. We further propose the Cognitive Fidelity Score (CFS) to measure the balance of model performance across pedagogical cognitive dimensions. Experiments on 12 multimodal large language models reveal a clear generative gap, where models perform relatively well on discriminative tasks but degrade on higher-order pedagogical diagnosis, often relying on parametric memory rather than grounded visual perception. Project resources are available at https://github.com/Shallcom/PedagogyBench.
2022
EmRel: Joint Representation of Entities and Embedded Relations for Multi-triple Extraction
Benfeng Xu | Quan Wang | Yajuan Lyu | Yabing Shi | Yong Zhu | Jie Gao | Zhendong Mao
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Benfeng Xu | Quan Wang | Yajuan Lyu | Yabing Shi | Yong Zhu | Jie Gao | Zhendong Mao
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Multi-triple extraction is a challenging task due to the existence of informative inter-triple correlations, and consequently rich interactions across the constituent entities and relations. While existing works only explore entity representations, we propose to explicitly introduce relation representation, jointly represent it with entities, and novelly align them to identify valid triples.We perform comprehensive experiments on document-level relation extraction and joint entity and relation extraction along with ablations to demonstrate the advantage of the proposed method.