GUIDEX: Guided Synthetic Data Generation for Zero-Shot Information Extraction

Neil De La Fuente, Oscar Sainz, Iker García-Ferrero, Eneko Agirre


Abstract
Information Extraction (IE) systems are traditionally domain-specific, requiring costlyadaptation that involves expert schema design,data annotation, and model training. WhileLarge Language Models have shown promisein zero-shot IE, performance degrades significantly in unseen domains where label definitions differ. This paper introduces GUIDEX,a novel method that automatically definesdomain-specific schemas, infers guidelines,and generates synthetically labeled instances,allowing for better out-of-domain generalization. Fine-tuning Llama 3.1 with GUIDEXsets a new state-of-the-art across seven zeroshot Named Entity Recognition benchmarks.Models trained with GUIDEX gain up to 7 F1points over previous methods without humanlabeled data, and nearly 2 F1 points higherwhen combined with it. Models trained onGUIDEX demonstrate enhanced comprehension of complex, domain-specific annotationschemas. Code, models, and synthetic datasetsare available at neilus03.github.io/guidex.com
Anthology ID:
2025.findings-acl.1245
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
24248–24262
Language:
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.1245/
DOI:
10.18653/v1/2025.findings-acl.1245
Bibkey:
Cite (ACL):
Neil De La Fuente, Oscar Sainz, Iker García-Ferrero, and Eneko Agirre. 2025. GUIDEX: Guided Synthetic Data Generation for Zero-Shot Information Extraction. In Findings of the Association for Computational Linguistics: ACL 2025, pages 24248–24262, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
GUIDEX: Guided Synthetic Data Generation for Zero-Shot Information Extraction (Fuente et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.1245.pdf