Format Matters: A Critical Evaluation of Output Formats for Prompting LLMs in SLU and NER
Pierre Lepagnol, Sahar Ghannay, Thomas Gerald, Christophe Servan, Sophie Rosset
Abstract
Output format is often an unreported factor in LLM evaluations for structured NLP tasks such as Slot Filling or Named Entity Recognition. This work proposes to explore the impact of the output structured format generated by LLMs. We show that measured performance and reliability depend on the requested format (JSON, XML or inline Key-Values). A study is performed across four SLU and three NER benchmarks and considering 13 instruction-tuned open-weight LLMs, using standardized and open-source prompts and parsers. This format-specific evaluation reveals statistically significant swings of 2-46 F1 points depending on model and dataset. Additionally, we propose a lightweight selection procedure to determine the best format per model-dataset combination using only a small development slice; thus reducing trial-and-error in practice.- Anthology ID:
- 2026.lrec-main.593
- Volume:
- Proceedings of the Fifteenth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2026
- Address:
- Palma de Mallorca, Spain
- Editors:
- Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
- Venue:
- LREC
- SIG:
- Publisher:
- ELRA Language Resource Association
- Note:
- Pages:
- 7485–7497
- Language:
- URL:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.593/
- DOI:
- Cite (ACL):
- Pierre Lepagnol, Sahar Ghannay, Thomas Gerald, Christophe Servan, and Sophie Rosset. 2026. Format Matters: A Critical Evaluation of Output Formats for Prompting LLMs in SLU and NER. International Conference on Language Resources and Evaluation, main:7485–7497.
- Cite (Informal):
- Format Matters: A Critical Evaluation of Output Formats for Prompting LLMs in SLU and NER (Lepagnol et al., LREC 2026)
- PDF:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.593.pdf