GUIDEX: Guided Synthetic Data Generation for Zero-Shot Information Extraction
Neil De La Fuente, Oscar Sainz, Iker García-Ferrero, Eneko Agirre
Abstract
Information Extraction (IE) systems are traditionally domain-specific, requiring costlyadaptation that involves expert schema design,data annotation, and model training. WhileLarge Language Models have shown promisein zero-shot IE, performance degrades significantly in unseen domains where label definitions differ. This paper introduces GUIDEX,a novel method that automatically definesdomain-specific schemas, infers guidelines,and generates synthetically labeled instances,allowing for better out-of-domain generalization. Fine-tuning Llama 3.1 with GUIDEXsets a new state-of-the-art across seven zeroshot Named Entity Recognition benchmarks.Models trained with GUIDEX gain up to 7 F1points over previous methods without humanlabeled data, and nearly 2 F1 points higherwhen combined with it. Models trained onGUIDEX demonstrate enhanced comprehension of complex, domain-specific annotationschemas. Code, models, and synthetic datasetsare available at neilus03.github.io/guidex.com- Anthology ID:
- 2025.findings-acl.1245
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2025
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 24248–24262
- Language:
- URL:
- https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.1245/
- DOI:
- 10.18653/v1/2025.findings-acl.1245
- Cite (ACL):
- Neil De La Fuente, Oscar Sainz, Iker García-Ferrero, and Eneko Agirre. 2025. GUIDEX: Guided Synthetic Data Generation for Zero-Shot Information Extraction. In Findings of the Association for Computational Linguistics: ACL 2025, pages 24248–24262, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- GUIDEX: Guided Synthetic Data Generation for Zero-Shot Information Extraction (Fuente et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.1245.pdf