@inproceedings{baitalik-datta-2026-garden,
title = "Garden Path Recovery in Causal and Masked Language Models",
author = "Baitalik, Sanjan and
Datta, Rajashik",
editor = "T.Y.S.S., Santosh and
Rodriguez, Juan Diego and
de Gibert, Ona",
booktitle = "Proceedings of the 64th Annual Meeting of the {A}ssociation for {C}omputational {L}inguistics ({ACL} 2026)",
month = jul,
year = "2026",
address = "San Diego, California, United States",
publisher = "Association for Computational Linguistics",
url = "https://preview.aclanthology.org/ingest-acl/2026.acl-srw.32/",
pages = "383--392",
ISBN = "979-8-89176-393-7",
abstract = "Garden-path sentences offer a controlled probe of English incremental sentence processing because they require a reader to revise an initially plausible parse when a later region disambiguates the structure. We present an architecture-aware comparison of garden-path recovery in causal and masked language models using 100 English garden-path/control pairs (200 sentences) spanning three constructions: NP/Z, where a noun phrase is initially read as a direct object but must be reanalyzed as the subject of a zero-complement clause; NP/S, where a noun phrase must be reanalyzed as the subject of an embedded sentence; and MV/RR, where an apparent main verb must be reanalyzed as a reduced relative modifier. Causal models are evaluated with left-to-right word surprisal, whereas masked models are evaluated with pseudo-surprisal derived from masked language model scoring. Beyond the disambiguating word, we analyze cumulative excess surprisal, area-under-curve recovery summaries, and layer-wise hidden-state divergence between each garden-path sentence and its minimally different control. Across the audit-valid model set, causal models show larger within-model disambiguation effects than masked models overall, with the clearest family-level difference on NP/Z constructions. We interpret this difference cautiously because surprisal and pseudo-surprisal are not numerically commensurable across architectures or tokenizers. The results nevertheless show that architecture changes which recovery signals are observable: decoder-only models exhibit sharper online disruption at the point of syntactic revision, while bidirectional encoders appear comparatively buffered at the disambiguator due to right-context access. More broadly, the findings argue that garden-path evaluation should emphasize recovery dynamics, not merely end-state plausibility or task accuracy."
}Markdown (Informal)
[Garden Path Recovery in Causal and Masked Language Models](https://preview.aclanthology.org/ingest-acl/2026.acl-srw.32/) (Baitalik & Datta, ACL 2026)
ACL
- Sanjan Baitalik and Rajashik Datta. 2026. Garden Path Recovery in Causal and Masked Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 383–392, San Diego, California, United States. Association for Computational Linguistics.