SAIR-Comb : A Structure-Aware Iterative Refinement Framework for Combinatorics Autoformalization

Weijie Jiang, Gaolei He, Beibei Xiong, Jianlin Wang, Zhengfeng Yang


Abstract
Autoformalization aims to bridge the gap between human mathematical intuition and formal proof by automating the translation of informal reasoning into machine-verifiable languages. Despite significant breakthroughs catalyzed by Large Language Models (LLMs), autoformalizing Combinatorics remains a formidable challenge due to its intricate structural dependencies and the severe scarcity of high-quality formal datasets. To address these challenges, we propose SAIR-Comb, a Structure-Aware Iterative Refinement framework for Combinatorics powered by Lean 4 and LLMs. SAIR-Comb employs a multi-stage pipeline: first, it performs data augmentation and refinement by rectifying syntactic, semantic, and structural errors, guided by a curated manual combinatorics dataset. The model then undergoes a two-stage training regime: expert iteration with syntactic grounding, followed by reinforcement learning (RL) to align formal reasoning trajectories. Furthermore, we introduce Structural Consistency—a rigorous new metric designed to expose formalizing failures that elude traditional semantic-only evaluations. Experiments demonstrate that SAIR-Comb achieves strong performance on the specialized CombiBench while remaining highly competitive on general-domain benchmarks, including PutnamBench and ProverBench.
Anthology ID:
2026.acl-long.1075
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
23452–23473
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1075/
DOI:
Bibkey:
Cite (ACL):
Weijie Jiang, Gaolei He, Beibei Xiong, Jianlin Wang, and Zhengfeng Yang. 2026. SAIR-Comb : A Structure-Aware Iterative Refinement Framework for Combinatorics Autoformalization. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 23452–23473, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
SAIR-Comb : A Structure-Aware Iterative Refinement Framework for Combinatorics Autoformalization (Jiang et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1075.pdf
Checklist:
 2026.acl-long.1075.checklist.pdf