Lost in Space: Finding the Right Tokens for Structured Output

Sil Hamilton, David Mimno


Abstract
General-purpose language models are trained to produce varied natural language outputs, but for some tasks, like annotation or classification, we need more specific output formats. LLM systems increasingly support structured output, which enforces formats by sampling tokens according to a grammar — but also unpredictably reduces downstream performance. Are there systematic differences between grammars that appear semantically (and often visually) similar to humans? To answer this, we test four popular model families with five varying output formats on four common NLP benchmarks. We find all models perform most accurately when guided to use formats respecting convention, such as letters for multiple choice and real numbers for numerical prediction. Performance also improves by 5%-10% when guiding models to return tokens incorporating leading whitespace, with smaller models benefiting the most. We find leading whitespace helps models avoid structural deficiencies in subword token representations. We finally present best practices for researchers using language models as zero-shot classifiers with structured output.
Anthology ID:
2026.gem-main.18
Volume:
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Simon Mille, Sebastian Gehrmann, Patrícia Schmidtová, Ondřej Dušek, Marzieh Fadaee, Kyle Lo, Enrico Santus, Gabriel Stanovsky
Venues:
GEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
155–166
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.18/
DOI:
Bibkey:
Cite (ACL):
Sil Hamilton and David Mimno. 2026. Lost in Space: Finding the Right Tokens for Structured Output. In Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM), pages 155–166, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Lost in Space: Finding the Right Tokens for Structured Output (Hamilton & Mimno, GEM 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.18.pdf