Quantifying Semantic Emergence in Language Models

Hang Chen; Xinyu Yang; Jiaying Zhu; Wenya Wang

Quantifying Semantic Emergence in Language Models

Hang Chen, Xinyu Yang, Jiaying Zhu, Wenya Wang

Abstract

Large language models (LLMs) are widely recognized for their exceptional capacity to capture semantics meaning. Yet, there remains no established metric to quantify this capability. In this work, we introduce a quantitative metric, Information Emergence (IE), designed to measure LLMs’ ability to extract semantics from input tokens. We formalize “semantics” as the meaningful information abstracted from a sequence of tokens and quantify this by comparing the entropy reduction observed for a sequence of tokens (macro-level) and individual tokens (micro-level). To achieve this, we design a lightweight estimator to compute the mutual information at each transformer layer, which is agnostic to different tasks and language model architectures. We apply IE in both synthetic in-context learning (ICL) scenarios and natural sentence contexts. Experiments demonstrate informativeness and patterns about semantics. While some of these patterns confirm the conventional prior linguistic knowledge, the rest are relatively unexpected, which may provide new insights.

Anthology ID:: 2025.acl-long.588
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12041–12054
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.588/
DOI:
Bibkey:
Cite (ACL):: Hang Chen, Xinyu Yang, Jiaying Zhu, and Wenya Wang. 2025. Quantifying Semantic Emergence in Language Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12041–12054, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Quantifying Semantic Emergence in Language Models (Chen et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.588.pdf

PDF Cite Search Fix data