Token Knowledge: A New Perspective For Knowledge in Large Language Models

Jieyong Wang, Chunyao Song, Tingjian Ge


Abstract
In the era of prosperity of large language models (LLMs), hallucination remains a serious issue hindering LLMs’ expansion and reliability. Predicting the presence (and absence) of certain knowledge in LLMs could aid the hallucination avoidance. However, the token-based generation mode of LLM is different from the knowledge storage structure in the form of triples, which makes it difficult to accurately evaluate the knowledge boundary of LLM. We approach this problem from a novel perspective and, for the first time, introduce the concept of token knowledge in large language models. Consequently, we propose a token knowledge dataset construction method and use the intermediate states during inference to train probes. This allows us to predict if a specific token will appear in the LLM’s generated sequence, without even generating a single token. Our approach unlocks the model’s latent potential, enhancing its accuracy in assessing token knowledge from about 60% to over 90%, with strong out-of-distribution generalization by training on just a few dozen prompts. Finally, we apply KEGT to enhance a state-of-the-art knowledge boundary detection method, achieving improved performance while reducing computational time by over 90%. Furthermore, KEGT enables prevention of hallucinations in certain cases by leveraging its guidance in the token-level knowledge semantic space. Our code is available at https://github.com/CC-2000/KEGT.
Anthology ID:
2025.findings-emnlp.418
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7912–7926
Language:
URL:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.418/
DOI:
10.18653/v1/2025.findings-emnlp.418
Bibkey:
Cite (ACL):
Jieyong Wang, Chunyao Song, and Tingjian Ge. 2025. Token Knowledge: A New Perspective For Knowledge in Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 7912–7926, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Token Knowledge: A New Perspective For Knowledge in Large Language Models (Wang et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.418.pdf
Checklist:
 2025.findings-emnlp.418.checklist.pdf