Jianlin Wang


2026

Autoformalization aims to bridge the gap between human mathematical intuition and formal proof by automating the translation of informal reasoning into machine-verifiable languages. Despite significant breakthroughs catalyzed by Large Language Models (LLMs), autoformalizing Combinatorics remains a formidable challenge due to its intricate structural dependencies and the severe scarcity of high-quality formal datasets. To address these challenges, we propose SAIR-Comb, a Structure-Aware Iterative Refinement framework for Combinatorics powered by Lean 4 and LLMs. SAIR-Comb employs a multi-stage pipeline: first, it performs data augmentation and refinement by rectifying syntactic, semantic, and structural errors, guided by a curated manual combinatorics dataset. The model then undergoes a two-stage training regime: expert iteration with syntactic grounding, followed by reinforcement learning (RL) to align formal reasoning trajectories. Furthermore, we introduce Structural Consistency—a rigorous new metric designed to expose formalizing failures that elude traditional semantic-only evaluations. Experiments demonstrate that SAIR-Comb achieves strong performance on the specialized CombiBench while remaining highly competitive on general-domain benchmarks, including PutnamBench and ProverBench.

2025

Automated Theorem Proving is an important and challenging task. Although large language models (LLMs) have demonstrated remarkable potential in mathematical reasoning, their performance in formal theorem proving remains constrained by the scarcity of high-quality supervised fine-tuning (SFT) data. To address this limitation, we propose a **Q**uality-**D**riven **T**heorem **S**ynthesis method (QDTSynth) in Lean4. During the statement synthesis, we enhance Monte Carlo Tree Search (MCTS) with an adaptive adjustment mechanism that dynamically optimizes the search strategy based on the synthesis of statements. In addition, we propose diversity screening and the self-assessment method to select theorems that exhibit both diversity and high quality from the initially synthetic statements, enabling the synthesis of a high-quality Lean4 theorem dataset. After fine-tuning three open-source large language models on our synthetic dataset, experiments on the miniF2F benchmark demonstrate that QDTSynth significantly improves the performance of various open-source LLMs in theorem proving tasks. Our work offers a promising new direction for the future synthesis of high-quality formal mathematical theorems.