@inproceedings{samoylov-vosoughi-2026-representing,
title = "Representing Lean Proofs as Trajectories in Latent Space",
author = "Samoylov, Elisaveta and
Vosoughi, Soroush",
editor = "T.Y.S.S., Santosh and
Rodriguez, Juan Diego and
de Gibert, Ona",
booktitle = "Proceedings of the 64th Annual Meeting of the {A}ssociation for {C}omputational {L}inguistics ({ACL} 2026)",
month = jul,
year = "2026",
address = "San Diego, California, United States",
publisher = "Association for Computational Linguistics",
url = "https://preview.aclanthology.org/ingest-acl/2026.acl-srw.105/",
pages = "1190--1202",
ISBN = "979-8-89176-393-7",
abstract = "Lean proofs are built as sequences of tactic-induced state transitions, yet learned models often represent proof steps primarily through tactic strings or raw proof-state text. Building on Delta Tokens, which encode a proof step by the local edit it induces between successive proof states, we train an encoder-only Transformer to learn contextualized representations of Lean proof steps from state changes. We then use these step representations to study complete proofs as trajectories in a learned latent space.We first show that the Delta-based Transformer yields better held-out next-tactic retrieval than a matched surface-syntax control, supporting the representational choice used in the trajectory analysis. We then analyze proof trajectories using path length, endpoint span, directness, curvature, and torsion. Across the LeanWorkbook slice used here, longer proofs become increasingly indirect within a relatively bounded latent span: path length grows sharply with proof length while endpoint span changes little, mean step size decreases, curvature rises modestly, and torsion falls. Qualitative case studies show that these geometric patterns align with recognizable proof organizations, including immediate closure, aligned accumulation, scaffolded enrichment, bookkeeping-heavy restructuring, and repeated local contradiction work.The dataset is small and heavily skewed toward short proofs, so the claims are necessarily limited. Within those limits, the results suggest that learned state-change representations recover nontrivial structure in how proofs unfold and provide a promising basis for future trajectory-aware theorem proving."
}Markdown (Informal)
[Representing Lean Proofs as Trajectories in Latent Space](https://preview.aclanthology.org/ingest-acl/2026.acl-srw.105/) (Samoylov & Vosoughi, ACL 2026)
ACL
- Elisaveta Samoylov and Soroush Vosoughi. 2026. Representing Lean Proofs as Trajectories in Latent Space. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 1190–1202, San Diego, California, United States. Association for Computational Linguistics.