Representing Lean Proofs as Trajectories in Latent Space

Elisaveta Samoylov; Soroush Vosoughi

Representing Lean Proofs as Trajectories in Latent Space

Abstract

Lean proofs are built as sequences of tactic-induced state transitions, yet learned models often represent proof steps primarily through tactic strings or raw proof-state text. Building on Delta Tokens, which encode a proof step by the local edit it induces between successive proof states, we train an encoder-only Transformer to learn contextualized representations of Lean proof steps from state changes. We then use these step representations to study complete proofs as trajectories in a learned latent space.We first show that the Delta-based Transformer yields better held-out next-tactic retrieval than a matched surface-syntax control, supporting the representational choice used in the trajectory analysis. We then analyze proof trajectories using path length, endpoint span, directness, curvature, and torsion. Across the LeanWorkbook slice used here, longer proofs become increasingly indirect within a relatively bounded latent span: path length grows sharply with proof length while endpoint span changes little, mean step size decreases, curvature rises modestly, and torsion falls. Qualitative case studies show that these geometric patterns align with recognizable proof organizations, including immediate closure, aligned accumulation, scaffolded enrichment, bookkeeping-heavy restructuring, and repeated local contradiction work.The dataset is small and heavily skewed toward short proofs, so the claims are necessarily limited. Within those limits, the results suggest that learned state-change representations recover nontrivial structure in how proofs unfold and provide a promising basis for future trajectory-aware theorem proving.

Anthology ID:: 2026.acl-srw.105
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Santosh T.Y.S.S., Juan Diego Rodriguez, Ona de Gibert
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1190–1202
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-srw.105/
DOI:
Bibkey:
Cite (ACL):: Elisaveta Samoylov and Soroush Vosoughi. 2026. Representing Lean Proofs as Trajectories in Latent Space. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 1190–1202, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Representing Lean Proofs as Trajectories in Latent Space (Samoylov & Vosoughi, ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-srw.105.pdf

PDF Cite Search Fix data