Foundations of LLM Knowledge Materialization: Termination, Reproducibility, Robustness

Luca Giordano; Simon Razniewski

Foundations of LLM Knowledge Materialization: Termination, Reproducibility, Robustness

Abstract

Large Language Models (LLMs) encode substantial factual knowledge, yet measuring and systematizing this knowledge remains challenging. Converting it into structured format—for example through recursive extraction approaches such as the GPTKB methodology (Hu et al., 2025b)—is still underexplored. Key open questions include whether such extraction can terminate, whether its outputs are reproducible, and how robust they are to variations.We systematically study LLM knowledge materialization using miniGPTKBs (domain-specific, tractable subcrawls), analyzing termination, reproducibility, and robustness across three categories of metrics: yield, lexical similarity, and semantic similarity. We experiment with four variations (seed, language, randomness, model) and three illustrative domains (from history, entertainment, and finance).Our findings show (i) high termination rates, though model-dependent; (ii) mixed reproducibility; and (iii) robustness that varies by perturbation type—high for seeds and temperature, lower for languages and models. These results suggest that LLM knowledge materialization can reliably surface core knowledge, while also revealing important limitations.

Anthology ID:: 2026.findings-eacl.113
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2145–2164
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.113/
DOI:
Bibkey:
Cite (ACL):: Luca Giordano and Simon Razniewski. 2026. Foundations of LLM Knowledge Materialization: Termination, Reproducibility, Robustness. In Findings of the Association for Computational Linguistics: EACL 2026, pages 2145–2164, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Foundations of LLM Knowledge Materialization: Termination, Reproducibility, Robustness (Giordano & Razniewski, Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.113.pdf
Checklist:: 2026.findings-eacl.113.checklist.pdf

PDF Cite Search Checklist Fix data