Pelican Soup Framework: A Theoretical Framework for Language Model Capabilities

Ting-Rui Chiang, Dani Yogatama


Abstract
In this work, we propose a simple theoretical framework, Pelican Soup, aiming to better understand how pretraining allows LLMs to (1) generalize to unseen instructions and (2) perform in-context learning, even when the verbalizers are irrelevant to the task. To this end, in our framework, we introduce the notion of "knowledge base" and "reference-sense association" and a simple formalism for natural language processing tasks. Our framework demonstrates how linguistic, psychology, and philosophy studies can inform our understanding of the language model and is connected to several other existing theoretical results. As an illustration of the usage of our framework, we derive a bound on in-context learning loss with our framework. Finally, we support our framework with empirical experiments and provide possible future research directions.
Anthology ID:
2026.findings-eacl.23
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
443–464
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.23/
DOI:
Bibkey:
Cite (ACL):
Ting-Rui Chiang and Dani Yogatama. 2026. Pelican Soup Framework: A Theoretical Framework for Language Model Capabilities. In Findings of the Association for Computational Linguistics: EACL 2026, pages 443–464, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Pelican Soup Framework: A Theoretical Framework for Language Model Capabilities (Chiang & Yogatama, Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.23.pdf
Checklist:
 2026.findings-eacl.23.checklist.pdf