Format Matters: A Critical Evaluation of Output Formats for Prompting LLMs in SLU and NER

Pierre Lepagnol, Sahar Ghannay, Thomas Gerald, Christophe Servan, Sophie Rosset


Abstract
Output format is often an unreported factor in LLM evaluations for structured NLP tasks such as Slot Filling or Named Entity Recognition. This work proposes to explore the impact of the output structured format generated by LLMs. We show that measured performance and reliability depend on the requested format (JSON, XML or inline Key-Values). A study is performed across four SLU and three NER benchmarks and considering 13 instruction-tuned open-weight LLMs, using standardized and open-source prompts and parsers. This format-specific evaluation reveals statistically significant swings of 2-46 F1 points depending on model and dataset. Additionally, we propose a lightweight selection procedure to determine the best format per model-dataset combination using only a small development slice; thus reducing trial-and-error in practice.
Anthology ID:
2026.lrec-main.593
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
7485–7497
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.593/
DOI:
Bibkey:
Cite (ACL):
Pierre Lepagnol, Sahar Ghannay, Thomas Gerald, Christophe Servan, and Sophie Rosset. 2026. Format Matters: A Critical Evaluation of Output Formats for Prompting LLMs in SLU and NER. International Conference on Language Resources and Evaluation, main:7485–7497.
Cite (Informal):
Format Matters: A Critical Evaluation of Output Formats for Prompting LLMs in SLU and NER (Lepagnol et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.593.pdf