GeometryZero: Advancing Geometry Solving via Group Contrastive Policy Optimization

Yikun Wang, Yibin Wang, Dianyi Wang, Zimian Peng, Qipeng Guo, Dacheng Tao, Jiaqi Wang


Abstract
Recent progress in large language models (LLMs) has boosted mathematical reasoning, yet geometry remains challenging where auxiliary construction is often essential. Prior methods either underperform or depend on very large models (e.g., GPT-4o), making them costly. We argue that reinforcement learning with verifiable rewards (e.g., GRPO) can train smaller models to couple auxiliary construction with solid geometric reasoning. However, naively applying GRPO yields unconditional rewards, encouraging indiscriminate and sometimes harmful constructions. We propose Group Contrastive Policy Optimization (GCPO), an RL framework with two components: (1) Group Contrastive Masking, which assigns positive/negative construction rewards based on contextual utility, and (2) a Length Reward that encourages longer reasoning chains. On top of GCPO, we build GeometryZero, an affordable family of geometry reasoning models that selectively use auxiliary construction. Experiments on Geometry3K and MathVista show GeometryZero consistently outperforms RL baselines (e.g., GRPO, ToRL).
Anthology ID:
2026.findings-acl.1392
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
27948–27963
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1392/
DOI:
Bibkey:
Cite (ACL):
Yikun Wang, Yibin Wang, Dianyi Wang, Zimian Peng, Qipeng Guo, Dacheng Tao, and Jiaqi Wang. 2026. GeometryZero: Advancing Geometry Solving via Group Contrastive Policy Optimization. In Findings of the Association for Computational Linguistics: ACL 2026, pages 27948–27963, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
GeometryZero: Advancing Geometry Solving via Group Contrastive Policy Optimization (Wang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1392.pdf
Checklist:
 2026.findings-acl.1392.checklist.pdf