Neil De La Fuente


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
GUIDEX: Guided Synthetic Data Generation for Zero-Shot Information Extraction
Neil De La Fuente | Oscar Sainz | Iker García-Ferrero | Eneko Agirre
Findings of the Association for Computational Linguistics: ACL 2025

Information Extraction (IE) systems are traditionally domain-specific, requiring costlyadaptation that involves expert schema design,data annotation, and model training. WhileLarge Language Models have shown promisein zero-shot IE, performance degrades significantly in unseen domains where label definitions differ. This paper introduces GUIDEX,a novel method that automatically definesdomain-specific schemas, infers guidelines,and generates synthetically labeled instances,allowing for better out-of-domain generalization. Fine-tuning Llama 3.1 with GUIDEXsets a new state-of-the-art across seven zeroshot Named Entity Recognition benchmarks.Models trained with GUIDEX gain up to 7 F1points over previous methods without humanlabeled data, and nearly 2 F1 points higherwhen combined with it. Models trained onGUIDEX demonstrate enhanced comprehension of complex, domain-specific annotationschemas. Code, models, and synthetic datasetsare available at neilus03.github.io/guidex.com