SAGE: Sparse Adaptive Guidance for Dependency-Aware Tabular Data Generation

Shuo Yang, Zheyu Zhang, Bardh Prenkaj, Gjergji Kasneci


Abstract
Generating high-fidelity synthetic tabular data remains a critical challenge for enhancing data availability in privacy-sensitive and low-resource domains. Recent approaches leverage LLMs by representing table rows as sequences, yet suffer from two fundamental limitations: (1) they model feature dependencies densely, introducing spurious correlations; and (2) they assume static relationships between features, ignoring how these dependencies vary with feature values. To overcome these limitations, we introduce SAGE (Sparse Adaptive Guidance), a novel LLM-based generation framework that enforces sparse and dynamic dependency guidance. SAGE discretizes features into value-aware pseudo-features and constructs a mutual information-based sparse dependency graph. This graph adaptively guides generation through explicit context selection or implicit logit correction, enabling LLMs to focus on truly relevant information during synthesis. Our extensive experiments across six datasets and multiple tasks reveal that SAGE not only improves data fidelity and downstream utility, boosting F1 scores by 10% compared to previous LLM-based methods, but also reduces policy violations by one point. These results highlight the importance of adaptive structure in tabular data generation and provide new insights into context-sensitive control of LLMs.
Anthology ID:
2026.acl-long.174
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3792–3807
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.174/
DOI:
Bibkey:
Cite (ACL):
Shuo Yang, Zheyu Zhang, Bardh Prenkaj, and Gjergji Kasneci. 2026. SAGE: Sparse Adaptive Guidance for Dependency-Aware Tabular Data Generation. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3792–3807, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
SAGE: Sparse Adaptive Guidance for Dependency-Aware Tabular Data Generation (Yang et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.174.pdf
Checklist:
 2026.acl-long.174.checklist.pdf