Toni J.b. Liu
2025
What’s in a prompt? Language models encode literary style in prompt embeddings
Raphaël Sarfati
|
Haley Moller
|
Toni J.b. Liu
|
Nicolas Boulle
|
Christopher Earls
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models use high-dimensional latent spaces to encode and process textual information. Much work has investigated how the conceptual content of words translates into geometrical relationships between their vector representations. Fewer studies analyze how the cumulative information of an entire prompt becomes condensed into individual embeddings under the action of transformer layers. We use literary pieces to show that information about intangible, rather than factual, aspects of the prompt are contained in deep representations. We observe that short excerpts (10 - 100 tokens) from different novels separate in the latent space independently from what next-token prediction they converge towards. Ensembles from books from the same authors are much more entangled than across authors, suggesting that embeddings encode stylistic features. This geometry of style may have applications for authorship attribution and literary analysis, but most importantly reveals the sophistication of information processing and compression accomplished by language models.
2024
LLMs learn governing principles of dynamical systems, revealing an in-context neural scaling law
Toni J.b. Liu
|
Nicolas Boulle
|
Raphaël Sarfati
|
Christopher Earls
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
We study LLMs’ ability to extrapolate the behavior of various dynamical systems, including stochastic, chaotic, continuous, and discrete systems, whose evolution is governed by principles of physical interest. Our results show that LLaMA-2, a language model trained on text, achieves accurate predictions of dynamical system time series without fine-tuning or prompt engineering. Moreover, the accuracy of the learned physical rules increases with the length of the input context window, revealing an in-context version of a neural scaling law. Along the way, we present a flexible and efficient algorithm for extracting probability density functions of multi-digit numbers directly from LLMs.