Cross-Domain Semantic Fidelity Evaluation for Meaning-to-Text Generation

Davan Harrison, Marilyn Walker


Abstract
Slot Error Rate (SER) is the standard metric for evaluating semantic accuracy in meaning-to-text generation, but computing it has historically required domain-specific scripts that do not generalize across datasets. We present a cross-domain SER evaluation framework that replaces hand-crafted rules with a learned slot extraction model. We adapt Llama-3.2-3B-Instruct with LoRA, updating only 0.34% of its parameters, and show that this small adapted model outperforms prompted frontier LLMs by a wide margin on structured extraction across 23 dialogue domains. We further apply overgenerate-and-rank to the extraction task itself, generating multiple candidate meaning representations and selecting the best one with a trained ranker, which improves SER-Accuracy from 75% to 88%. We combine the extraction model with a Natural Language Inference (NLI) verification baseline through learned per-example routing, achieving 90.0% accuracy on held-out evaluation pairs without any domain-specific rule engineering. We compare our framework against published rule-based SER tools and show that our learned approach matches or outperforms hand-crafted scripts on all six comparable domains.
Anthology ID:
2026.gem-main.41
Volume:
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Simon Mille, Sebastian Gehrmann, Patrícia Schmidtová, Ondřej Dušek, Marzieh Fadaee, Kyle Lo, Enrico Santus, Gabriel Stanovsky
Venues:
GEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
443–455
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.41/
DOI:
Bibkey:
Cite (ACL):
Davan Harrison and Marilyn Walker. 2026. Cross-Domain Semantic Fidelity Evaluation for Meaning-to-Text Generation. In Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM), pages 443–455, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Cross-Domain Semantic Fidelity Evaluation for Meaning-to-Text Generation (Harrison & Walker, GEM 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.41.pdf