Explain the Synth: Interpretable Evaluation of LLM Data Synthesis

Yue Yang, Fan Yang, Yu Bai, Hao Wang


Abstract
Large language models (LLMs) are increasingly used to generate synthetic data, in which tabular data constitute a fundamental data modality across a wide range of domains. Yet, current evaluation practices often provide limited insights into whether the synthetic data preserve real data-generating relationships or introduce plausible-looking artifacts. We present a conceptually simple, interpretable auditing framework that compares the explanatory structure induced by real versus synthetic data. The key idea is to use a transparent rule-based model as a shared explanatory language: we extract rules from real data to summarize how features relate to labels, then examine how this rule structure changes when explained using LLM-generated data. Importantly, these rules are derived by an independent rule auditor rather than by the generator itself. The resulting “explanation shift” reveals which relationships are preserved, weakened, removed, or newly introduced by the generator, offering actionable diagnostics beyond aggregate fidelity scores. We further provide a theoretical perspective that links explanation shift and cross-domain predictive gaps to distribution mismatch within an interpretable hypothesis class. Overall, our approach turns synthetic data evaluation into a human-auditable comparison of explanations, improving transparency for LLM-based tabular synthesis.
Anthology ID:
2026.acl-long.1995
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
43054–43077
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1995/
DOI:
Bibkey:
Cite (ACL):
Yue Yang, Fan Yang, Yu Bai, and Hao Wang. 2026. Explain the Synth: Interpretable Evaluation of LLM Data Synthesis. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 43054–43077, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Explain the Synth: Interpretable Evaluation of LLM Data Synthesis (Yang et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1995.pdf
Checklist:
 2026.acl-long.1995.checklist.pdf