DHPLT: large-scale multilingual diachronic corpora and word representations for semantic change modelling

Mariia Fedorova, Andrey Kutuzov, Khonzoda Umarova


Abstract
In this resource paper, we present DHPLT, an open collection of diachronic corpora in 41 diverse languages. DHPLT is based on the web-crawled HPLT datasets; we use web crawl timestamps as the approximate signal of document creation time. The collection covers three time periods: 2011-2015, 2020-2021 and 2024-present (1 million documents per time period for each language). We additionally provide pre-computed word type and token embeddings and lexical substitutions for our chosen target words, while at the same time leaving it open for the other researchers to come up with their own target words using the same datasets.DHPLT aims at filling in the current lack of multilingual diachronic corpora for semantic change modelling (beyond a dozen of high-resource languages). It opens the way for a variety of new experimental setups in this field.
Anthology ID:
2026.lchange-1.7
Volume:
The Proceedings for the 6th International Workshop on Computational Approaches to Language Change (LChange’26)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Nina Tahmasebi, Pierluigi Cassotti, Syrielle Montariol, Andrey Kutuzov, Netta Huebscher, Elena Spaziani, Naomi Baes
Venue:
LChange
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
87–96
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.lchange-1.7/
DOI:
Bibkey:
Cite (ACL):
Mariia Fedorova, Andrey Kutuzov, and Khonzoda Umarova. 2026. DHPLT: large-scale multilingual diachronic corpora and word representations for semantic change modelling. In The Proceedings for the 6th International Workshop on Computational Approaches to Language Change (LChange’26), pages 87–96, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
DHPLT: large-scale multilingual diachronic corpora and word representations for semantic change modelling (Fedorova et al., LChange 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.lchange-1.7.pdf