Enabling LLM Knowledge Analysis via Extensive Materialization

Yujia Hu; Tuan-Phong Nguyen; Shrestha Ghosh; Simon Razniewski

Enabling LLM Knowledge Analysis via Extensive Materialization

Yujia Hu, Tuan-Phong Nguyen, Shrestha Ghosh, Simon Razniewski

Abstract

Large language models (LLMs) have majorly advanced NLP and AI, and next to their ability to perform a wide range of procedural tasks, a major success factor is their internalized factual knowledge. Since (Petroni et al., 2019), analyzing this knowledge has gained attention. However, most approaches investigate one question at a time via modest-sized pre-defined samples, introducing an “availability bias” (Tverski and Kahnemann, 1973) that prevents the analysis of knowledge (or beliefs) of LLMs beyond the experimenter’s predisposition.To address this challenge, we propose a novel methodology to comprehensively materialize an LLM’s factual knowledge through recursive querying and result consolidation. Our approach is a milestone for LLM research, for the first time providing constructive insights into the scope and structure of LLM knowledge (or beliefs).As a prototype, we extract a knowledge base (KB) comprising 101 million relational triples for over 2.9 million entities from GPT-4o-mini. We use GPTKB to exemplarily analyze GPT-4o-mini’s factual knowledge in terms of scale, accuracy, bias, cutoff and consistency, at the same time. Our resource is accessible at https://gptkb.org.

Anthology ID:: 2025.acl-long.789
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16189–16202
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.789/
DOI:
Bibkey:
Cite (ACL):: Yujia Hu, Tuan-Phong Nguyen, Shrestha Ghosh, and Simon Razniewski. 2025. Enabling LLM Knowledge Analysis via Extensive Materialization. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16189–16202, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Enabling LLM Knowledge Analysis via Extensive Materialization (Hu et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.789.pdf

PDF Cite Search Fix data