Taxonomy-Driven Knowledge Graph Construction for Domain-Specific Scientific Applications
Huitong Pan, Qi Zhang, Mustapha Adamu, Eduard Dragut, Longin Jan Latecki
Abstract
We present a taxonomy-driven framework for constructing domain-specific knowledge graphs (KGs) that integrates structured taxonomies, Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG). Although we focus on climate science to illustrate its effectiveness, our approach can potentially be adapted for other specialized domains. Existing methods often neglect curated taxonomies—hierarchies of verified entities and relationships—and LLMs frequently struggle to extract KGs in specialized domains. Our approach addresses these gaps by anchoring extraction to expert-curated taxonomies, aligning entities and relations with domain semantics, and validating LLM outputs using RAG against the domain taxonomy. Through a climate science case study using our annotated dataset of 25 publications (1,705 entity-publication links, 3,618 expert-validated relationships), we demonstrate that taxonomy-guided LLM prompting combined with RAG-based validation reduces hallucinations by 23.3% while improving F1 scores by 13.9% compared to baselines without the proposed techniques. Our contributions include: 1) a generalizable methodology for taxonomy-aligned KG construction; 2) a reproducible annotation pipeline, 3) the first benchmark dataset for climate science information retrieval; and 4) empirical insights into combining structured taxonomies with LLMs for specialized domains. The dataset, including expert annotations and taxonomy-aligned outputs, is publicly available at https://github.com/Jo-Pan/ClimateIE, and the accompanying framework can be accessed at https://github.com/Jo-Pan/TaxoDrivenKG.- Anthology ID:
- 2025.findings-acl.223
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2025
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venues:
- Findings | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4295–4320
- Language:
- URL:
- https://preview.aclanthology.org/acl25-workshop-ingestion/2025.findings-acl.223/
- DOI:
- Cite (ACL):
- Huitong Pan, Qi Zhang, Mustapha Adamu, Eduard Dragut, and Longin Jan Latecki. 2025. Taxonomy-Driven Knowledge Graph Construction for Domain-Specific Scientific Applications. In Findings of the Association for Computational Linguistics: ACL 2025, pages 4295–4320, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Taxonomy-Driven Knowledge Graph Construction for Domain-Specific Scientific Applications (Pan et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/acl25-workshop-ingestion/2025.findings-acl.223.pdf