Pretraining Language Models for Diachronic Linguistic Change Discovery
Elisabeth Fittschen, Sabrina Xin Li, Tom Lippincott, Leshem Choshen, Craig Messner
Abstract
Large language models (LLMs) are increasingly used as knowledge discovery tools. Humanistic disciplines like historical linguistics and literary studies have shown interest in this capability. These fields often construct arguments on the basis of distinctions between phenomena like time-period or genre. Such methodological investments complicate reliance on LLMs pretrained over large sets of broadly-collected data. We show that efficient pretraining techniques produce useful models of semantic change over modest historical corpora without allowing potential contamination from anachronistic data. We verify that these trained-from-scratch models better respect historical divisions and are more computationally efficient compared to the standard approach of fine-tuning an existing LLM. We compare the trade-offs in general linguistic fluency versus detecting and characterizing various forms of linguistic change, and provide a pipeline implementation of our approach that can be readily adapted and applied to a wide range of diachronic phenomena.- Anthology ID:
- 2026.findings-eacl.241
- Volume:
- Findings of the Association for Computational Linguistics: EACL 2026
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Editors:
- Vera Demberg, Kentaro Inui, Lluís Marquez
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4627–4642
- Language:
- URL:
- https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.241/
- DOI:
- Cite (ACL):
- Elisabeth Fittschen, Sabrina Xin Li, Tom Lippincott, Leshem Choshen, and Craig Messner. 2026. Pretraining Language Models for Diachronic Linguistic Change Discovery. In Findings of the Association for Computational Linguistics: EACL 2026, pages 4627–4642, Rabat, Morocco. Association for Computational Linguistics.
- Cite (Informal):
- Pretraining Language Models for Diachronic Linguistic Change Discovery (Fittschen et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.241.pdf