Jianghangfan Zhang
2026
GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning
Jianghangfan Zhang | Yibo Yan | Kening Zheng | Xin Zou | Song Dai | Xuming Hu
Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR)
Jianghangfan Zhang | Yibo Yan | Kening Zheng | Xin Zou | Song Dai | Xuming Hu
Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR)
Multimodal Large Language Models (MLLMs) demonstrate remarkable capabilities but often struggle with complex, multi-step mathematical reasoning, where minor errors in visual perception or logical deduction can lead to complete failure. While Process Reward Models (PRMs) offer step-by-step supervision, existing multimodal PRMs are limited to being binary verifiers that can identify but not correct errors, offering little explanatory power. To address these deficiencies, we introduce the **Generative Multimodal Process Reward Model (GM-PRM), a novel paradigm that transforms the PRM from a passive judge into an active reasoning collaborator**. Instead of a simple scalar score, GM-PRM provides a fine-grained, interpretable analysis of each reasoning step, evaluating its step intent, visual alignment, and logical soundness. More critically, GM-PRM is trained to generate a corrected version of the first erroneous step it identifies. This unique corrective capability enables our new test-time inference strategy, Refined Best-of-N (Refined-BoN). This framework actively enhances solution quality by using the PRM’s generated correction to guide the policy model toward a more promising reasoning trajectory, thereby improving the diversity and correctness of the solution pool. We demonstrate that GM-PRM achieves state-of-the-art results on multiple multimodal math benchmarks, significantly boosting policy model performance with remarkable data efficiency, requiring only a 20K-sample training dataset.
2024
Prior Relational Schema Assists Effective Contrastive Learning for Inductive Knowledge Graph Completion
Ruilin Luo | Jiayi Li | Jianghangfan Zhang | Jing Xiao | Yujiu Yang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Ruilin Luo | Jiayi Li | Jianghangfan Zhang | Jing Xiao | Yujiu Yang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Knowledge Graph Completion (KGC) is a task aimed at uncovering the inherent relationships among known knowledge triplets in a Knowledge Graph (KG) and subsequently predicting missing links. Presently, there is a rising interest in inductive knowledge graph completion, where missing links may pertain to previously unobserved entities. Previous inductive KGC methods mainly rely on descriptive information of entities to improve the representation of unseen entities, neglecting to provide effective prior knowledge for relation modeling. To tackle this challenge, we capture prior schema-level interactions related to relations by leveraging entity type information, thereby furnishing effective prior constraints when reasoning with newly introduced entities. Moreover, We employ normal in-batch negatives and introduce schema-guided negatives to bolster the efficiency of normal contrastive representation learning. Experimental results demonstrate that our approach consistently achieves state-of-the-art performance on various established metrics across multiple benchmark datasets for link prediction. Notably, our method achieves a 20.5% relative increase in Hits@1 on the HumanWiki-Ind dataset.