Tuan-Phong Nguyen


2025

pdf bib
Enabling LLM Knowledge Analysis via Extensive Materialization
Yujia Hu | Tuan-Phong Nguyen | Shrestha Ghosh | Simon Razniewski
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large language models (LLMs) have majorly advanced NLP and AI, and next to their ability to perform a wide range of procedural tasks, a major success factor is their internalized factual knowledge. Since (Petroni et al., 2019), analyzing this knowledge has gained attention. However, most approaches investigate one question at a time via modest-sized pre-defined samples, introducing an “availability bias” (Tverski and Kahnemann, 1973) that prevents the analysis of knowledge (or beliefs) of LLMs beyond the experimenter’s predisposition.To address this challenge, we propose a novel methodology to comprehensively materialize an LLM’s factual knowledge through recursive querying and result consolidation. Our approach is a milestone for LLM research, for the first time providing constructive insights into the scope and structure of LLM knowledge (or beliefs).As a prototype, we extract a knowledge base (KB) comprising 101 million relational triples for over 2.9 million entities from GPT-4o-mini. We use GPTKB to exemplarily analyze GPT-4o-mini’s factual knowledge in terms of scale, accuracy, bias, cutoff and consistency, at the same time. Our resource is accessible at https://gptkb.org.

2022

pdf bib
Materialized Knowledge Bases from Commonsense Transformers
Tuan-Phong Nguyen | Simon Razniewski
Proceedings of the First Workshop on Commonsense Representation and Reasoning (CSRR 2022)

Starting from the COMET methodology by Bosselut et al. (2019), generating commonsense knowledge directly from pre-trained language models has recently received significant attention. Surprisingly, up to now no materialized resource of commonsense knowledge generated this way is publicly available. This paper fills this gap, and uses the materialized resources to perform a detailed analysis of the potential of this approach in terms of precision and recall. Furthermore, we identify common problem cases, and outline use cases enabled by materialized resources. We posit that the availability of these resources is important for the advancement of the field, as it enables an off-the-shelf-use of the resulting knowledge, as well as further analyses on its strengths and weaknesses.

2021

pdf bib
Inside ASCENT: Exploring a Deep Commonsense Knowledge Base and its Usage in Question Answering
Tuan-Phong Nguyen | Simon Razniewski | Gerhard Weikum
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

ASCENT is a fully automated methodology for extracting and consolidating commonsense assertions from web contents (Nguyen et al., 2021). It advances traditional triple-based commonsense knowledge representation by capturing semantic facets like locations and purposes, and composite concepts, i.e., subgroups and related aspects of subjects. In this demo, we present a web portal that allows users to understand its construction process, explore its content, and observe its impact in the use case of question answering. The demo website (https://ascent.mpi-inf.mpg.de) and an introductory video (https://youtu.be/qMkJXqu_Yd4) are both available online.