Kaijing Ma
2026
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization
Zhongyuan Peng | Yifan Yao | Kaijing Ma | Shuyue Guo | Yizhe Li | Yichi Zhang | Chenchen Zhang | Yifan Zhang | Zhouliang Yu | Luming Li | Minghao Liu | Yihang Xia | Jiawei Shen | Yuchen Wu | Yixin Cao | Zhaoxiang Zhang | Wenhao Huang | Jiaheng Liu | Ge Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhongyuan Peng | Yifan Yao | Kaijing Ma | Shuyue Guo | Yizhe Li | Yichi Zhang | Chenchen Zhang | Yifan Zhang | Zhouliang Yu | Luming Li | Minghao Liu | Yihang Xia | Jiawei Shen | Yuchen Wu | Yixin Cao | Zhaoxiang Zhang | Wenhao Huang | Jiaheng Liu | Ge Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Translating natural language mathematical statements into formal, executable code is a fundamental challenge in automated theorem proving. While prior work has focused on generation and compilation success, little attention has been paid to the critic phase—the evaluation of whether generated formalizations truly capture the semantic intent of the original problem. In this paper, we introduce CriticLean, a novel critic-guided reinforcement learning framework that elevates the role of the critic from a passive validator to an active learning component. Specifically, first, we propose the CriticLeanGPT, trained via supervised fine-tuning and reinforcement learning, to rigorously assess the semantic fidelity of Lean 4 formalizations. Then, we introduce CriticLeanBench, a benchmark designed to measure models’ ability to distinguish semantically correct from incorrect formalizations, and demonstrate that our trained CriticLeanGPT models can significantly outperform strong open- and closed-source baselines. Building on the CriticLean framework, we construct FineLeanCorpus, a dataset comprising over 509K problems that exhibits rich domain diversity, broad difficulty coverage, and high correctness based on human evaluation.Overall, our findings highlight that optimizing the critic phase is essential for producing reliable formalizations and we hope our CriticLean will provide valuable insights for future advances in formal mathematical reasoning.
2025
KARPA: A Training-free Method of Adapting Knowledge Graph as References for Large Language Model’s Reasoning Path Aggregation
Siyuan Fang | Kaijing Ma | Tianyu Zheng | Xeron Du | Ningxuan Lu | Ge Zhang | Qingkun Tang
Findings of the Association for Computational Linguistics: ACL 2025
Siyuan Fang | Kaijing Ma | Tianyu Zheng | Xeron Du | Ningxuan Lu | Ge Zhang | Qingkun Tang
Findings of the Association for Computational Linguistics: ACL 2025
Large language models (LLMs) demonstrate exceptional performance across a variety of tasks, yet they are often affected by hallucinations and the timeliness of knowledge. Leveraging knowledge graphs (KGs) as external knowledge sources has emerged as a viable solution, but existing methods for LLM-based knowledge graph question answering (KGQA) are often limited by step-by-step decision-making on KGs, restricting the global planning and reasoning capabilities of LLMs, or they require fine-tuning or pre-training on specific KGs. To address these challenges, we propose Knowledge graph Assisted Reasoning Path Aggregation (KARPA), a novel framework that harnesses the global planning abilities of LLMs for efficient and accurate KG reasoning. KARPA operates in three steps: pre-planning relation paths using the LLM’s global planning capabilities, matching semantically relevant paths via an embedding model, and reasoning over these paths to generate answers. Unlike existing KGQA methods, KARPA avoids stepwise traversal, requires no additional training, and is adaptable to various LLM architectures. Extensive experimental results show that KARPA achieves state-of-the-art performance in KGQA tasks, delivering both high efficiency and accuracy.
2024
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
Siwei Wu | Yizhi Li | Kang Zhu | Ge Zhang | Yiming Liang | Kaijing Ma | Chenghao Xiao | Haoran Zhang | Bohao Yang | Wenhu Chen | Wenhao Huang | Noura Al Moubayed | Jie Fu | Chenghua Lin
Findings of the Association for Computational Linguistics: ACL 2024
Siwei Wu | Yizhi Li | Kang Zhu | Ge Zhang | Yiming Liang | Kaijing Ma | Chenghao Xiao | Haoran Zhang | Bohao Yang | Wenhu Chen | Wenhao Huang | Noura Al Moubayed | Jie Fu | Chenghua Lin
Findings of the Association for Computational Linguistics: ACL 2024
Multi-modal information retrieval (MMIR) is a rapidly evolving field where significant progress has been made through advanced representation learning and cross-modality alignment research, particularly in image-text pairing.However, current benchmarks for evaluating MMIR performance on image-text pairings overlook the scientific domain, which has a notable gap with the generic data since the caption of scientific charts and tables usually describes the analysis of experimental results or scientific principles in contrast to human activity or scenery depicted in generic images.To bridge this gap, we develop a scientific domain-specific MMIR benchmark (SciMMIR) by leveraging open-access research paper corpora to extract data relevant to the scientific domain. This benchmark comprises 530K meticulously curated image-text pairs, extracted from figures and tables with detailed captions from scientific documents.We further annotate the image-text pairs with a two-level subset-subcategory hierarchy to facilitate a more comprehensive evaluation of the baselines. We conduct zero-shot and fine-tuned evaluations on prominent multi-modal image-captioning and visual language models, such as CLIP, BLIP, and BLIP-2.Our findings offer critical insights for MMIR in the scientific domain, including the impact of pre-training and fine-tuning settings and the effects of different visual and textual encoders.
Search
Fix author
Co-authors
- Ge Zhang 3
- Wenhao Huang 2
- Noura Al Moubayed 1
- Yixin Cao 1
- Wenhu Chen 1
- Xeron Du 1
- Siyuan Fang 1
- Jie Fu 1
- Shuyue Guo 1
- Luming Li 1
- Yizhe Li 1
- Yizhi Li 1
- Yiming Liang 1
- Chenghua Lin 1
- Jiaheng Liu 1
- Minghao Liu 1
- Ningxuan Lu 1
- Zhongyuan Peng 1
- Jiawei Shen 1
- Qingkun Tang 1
- Siwei Wu 1
- Yuchen Wu 1
- Yihang Xia 1
- Chenghao Xiao 1
- Bohao Yang 1
- Yifan Yao 1
- Zhouliang Yu 1
- Chenchen Zhang 1
- Haoran Zhang 1
- Yichi Zhang 1
- Yifan Zhang 1
- Zhaoxiang Zhang 1
- Tianyu Zheng 1
- Kang Zhu 1