The Proceedings for the 6th International Workshop on Computational Approaches to Language Change (LChange’26)
Nina Tahmasebi, Pierluigi Cassotti, Syrielle Montariol, Andrey Kutuzov, Netta Huebscher, Elena Spaziani, Naomi Baes (Editors)
- Anthology ID:
- 2026.lchange-1
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Venue:
- LChange
- SIG:
- Publisher:
- Association for Computational Linguistics
- URL:
- https://preview.aclanthology.org/ingest-eacl/2026.lchange-1/
- DOI:
- ISBN:
- 979-8-89176-362-3
- PDF:
- https://preview.aclanthology.org/ingest-eacl/2026.lchange-1.pdf
The Proceedings for the 6th International Workshop on Computational Approaches to Language Change (LChange’26)
Nina Tahmasebi | Pierluigi Cassotti | Syrielle Montariol | Andrey Kutuzov | Netta Huebscher | Elena Spaziani | Naomi Baes
Nina Tahmasebi | Pierluigi Cassotti | Syrielle Montariol | Andrey Kutuzov | Netta Huebscher | Elena Spaziani | Naomi Baes
The SlangTrack Dataset: Supporting the Detection of Words Used in Slang Senses
Afnan Mohammed Aloraini | Riza Batista-Navarro | Goran Nenadic | Viktor Schlegel
Afnan Mohammed Aloraini | Riza Batista-Navarro | Goran Nenadic | Viktor Schlegel
Slang is widespread in informal communication, yet its fluidity poses challenges for natural language processing (NLP), especially when words alternate between slang and non-slang senses. While prior work has examined slang through dictionaries, sentiment analysis, and lexicon building, little attention has been given to detecting slang usage in context. We address this gap by reframing slang detection as distinguishing slang from non-slang senses of the same lexical item. To support this task, we introduce SlangTrack (ST), a diachronically structured dataset of dual-meaning words annotated at the sentence level with high inter-annotator agreement. We benchmark (1) deep learning models with static and contextual embeddings, (2) transformer-based models, and (3) large language models evaluated in zero-shot, few-shot, and fine-tuned settings. Fine-tuned transformers, especially BERT-large enriched with sentiment and emotion features, achieve the strongest performance, reaching an F1-score of 72% for slang and 92% for non-slang usage. Our findings highlight both the difficulty of contextual slang detection and the value of affective cues for improving model robustness.
Statistical Semantic Change Detection via Usage Similarities
Taichi Aida | Daichi Mochihashi | Hiroya Takamura | Toshinobu Ogiso | Mamoru Komachi
Taichi Aida | Daichi Mochihashi | Hiroya Takamura | Toshinobu Ogiso | Mamoru Komachi
Semantic change detection comprises two subtasks: classification, which predicts whether a target word has undergone a semantic shift, and ranking, which orders words according to the degree of their semantic change. While most prior studies concentrated on ranking subtask, the classification subtask plays an equally important role, since many practical scenarios require a yes/no decision on semantic change rather than a global ranking. In this work, we propose a novel statistical method that predicts the presence or absence of semantic change. While most existing approaches infer semantic change by comparing word embeddings across time periods or domains, our method directly models the diachronic/synchronic consistency of usage-level similarity scores. Our experiments on SemEval-2020 Task 1 and WUGS datasets demonstrate that the proposed formulation outperforms existing state-of-the-art embedding-based methods, and robustly detects semantic change across languages in both diachronic and synchronic settings.
Tonogenesis—the historical process by which segmental contrasts evolve into lexical tone—has traditionally been studied through comparative reconstruction and acoustic phonetics. We introduce a computational approach that quantifies the functional role of pitch at different stages of this sound change by measuring how pitch manipulation affects automatic speech recognition (ASR) performance. Through analysis on the sensitivity to pitch-flattening from a set of closely related Tibetan languages, we find evidence of a tonogenesis continuum: atonal Amdo dialects tolerate pitch removal the most, while fully tonal Ü-Tsang varieties show severe degradation, and intermediate Kham dialects fall measurably between these extremes. These gradient effects demonstrate how ASR models implicitly learn the shifting functional load of pitch as languages transition from consonant-based to tone-based lexical contrasts. Our findings show that computational methods can capture fine-grained stages of sound change and suggest that traditional functional load metrics, based solely on minimal pairs, may overestimate pitch dependence in transitional systems where segmental and suprasegmental cues remain phonetically intertwined.
Cross-lingual Lexical Semantic Change in Romance Languages
Ana Sabina Uban | Liviu P Dinu | Anca Daniela Dinu | Simona Georgescu
Ana Sabina Uban | Liviu P Dinu | Anca Daniela Dinu | Simona Georgescu
We present a comprehensive quantitative analysis of lexical semantic change in the five main Romance languages (Romanian, Italian, Spanish, French and Portuguese), based on the most exhaustive database of related words in these languages. We include both cognate words and borrowings (for the first time, to our knowledge), and compute semantic shift measures using different static and contextual embedding models, as well as three different corpora. We publish the obtained lists of semantic divergences across all related word pairs, compute global trends in language-level semantic divergence, and provide insights on particular study cases of highly stable and highly divergent words for different language pairs.
Threshold-Calibrated Word Sense Disambiguation: Semantic Broadening Without Sense Redistribution in Schizophrenia
Naomi Baes | Nick Haslam
Naomi Baes | Nick Haslam
Polysemous words pose a challenge for computational approaches to language change. We extend a recent hypothesis-driven, prototype-based framework to estimate word sense prevalence in diachronic text corpora and apply it to 109,940 usages of schizophrenia drawn from U.S. news media (1985–2025). Our extensions include a contextual dispersion measure (Breadth), robust prototype construction, and human-calibrated prototype-similarity thresholds for conservative sense assignment at scale. Across four decades, distributional semantic change indices commonly used in lexical semantic change detection (LSCD) show significant increases in Breadth and baseline-relative semantic drift (APD), while changes in the central usage prototype (PRT) are influenced by term frequency. In contrast, threshold-calibrated sense assignments reveal stable sense proportions: the psychiatric sense remains dominant, with split-personality and metaphorical senses consistently marginal. Together, these results demonstrate that dispersion- and drift-based LSCD metrics can increase even under stable sense prevalence, indicating that such increases can occur without sense redistribution and primarily reflect broad shifts in usage distributions rather than evidence of polysemization or sense loss. We introduce a threshold-calibrated, prototype-based sense-tracking pipeline that enables conservative sense prevalence estimation at scale and clarifies whether rising distributional LSCD metrics reflect sense redistribution or increasing contextual diversity when historical sense annotation is limited.
Using Correspondence Patterns to Identify Irregular Words in Cognate Sets Through Leave-One-Out Validation
Frederic Blum | Johann-Mattis List
Frederic Blum | Johann-Mattis List
Regular sound correspondences constitute the principal evidence in historical language comparison. Despite the heuristic focus on regularity, it is often more an intuitive judgement than a quantified evaluation, and irregularity is more common than expected from the Neogrammarian model. Given the recent progress of computational methods in historical linguistics and the increased availability of standardized lexical data, we are now able to improve our workflows and provide such a quantitative evaluation. Here, we present the balanced average recurrence of correspondence patterns as a new measure of regularity. We also present a new computational method that uses this measure to identify cognate sets that lack regularity with respect to their correspondence patterns. We validate the method through two experiments, using simulated and real data. In the experiments, we employ leave-one-out validation to measure the regularity of cognate sets in which one word form has been replaced by an irregular one, checking how well our method identifies the forms causing the irregularity. Our method achieves an overall accuracy of 85% with the datasets based on real data. We also show the benefits of working with subsamples of large datasets and how increasing irregularity in the data influences our results. Reflecting on the broader potential of our new regularity measure and the irregular cognate identification method based on it, we conclude that they could play an important role in improving the quality of existing and future datasets in computer-assisted language comparison.
DHPLT: large-scale multilingual diachronic corpora and word representations for semantic change modelling
Mariia Fedorova | Andrey Kutuzov | Khonzoda Umarova
Mariia Fedorova | Andrey Kutuzov | Khonzoda Umarova
In this resource paper, we present DHPLT, an open collection of diachronic corpora in 41 diverse languages. DHPLT is based on the web-crawled HPLT datasets; we use web crawl timestamps as the approximate signal of document creation time. The collection covers three time periods: 2011-2015, 2020-2021 and 2024-present (1 million documents per time period for each language). We additionally provide pre-computed word type and token embeddings and lexical substitutions for our chosen target words, while at the same time leaving it open for the other researchers to come up with their own target words using the same datasets.DHPLT aims at filling in the current lack of multilingual diachronic corpora for semantic change modelling (beyond a dozen of high-resource languages). It opens the way for a variety of new experimental setups in this field.
Transparent Semantic Change Detection with Dependency-Based Profiles
Bach Phan Tat | Kris Heylen | Dirk Geeraerts | Stefano De Pascale | Dirk Speelman
Bach Phan Tat | Kris Heylen | Dirk Geeraerts | Stefano De Pascale | Dirk Speelman
Most modern computational approaches to lexical semantic change detection (LSC) rely on embedding-based distributional word representations with neural networks. Despite the strong performance on LSC benchmarks, they are often opaque. We investigate an alternative method which relies purely on dependency co-occurrence patterns of words. We demonstrate that it is effective for semantic change detection and even outperforms a number of distributional semantic models. We provide an in-depth quantitative and qualitative analysis of the predictions, showing that they are plausible and interpretable.
Semantic Change Characterization with LLMs using Rhetorics
Jáder Martins Camboim de Sá | Jooyoung Lee | Marcos Da Silveira | Cedric Pruski
Jáder Martins Camboim de Sá | Jooyoung Lee | Marcos Da Silveira | Cedric Pruski
Languages continually evolve in response to societal events, resulting in new terms and shifts in meanings. These changes have significant implications for computer applications, including automatic translation and chatbots, making it essential to characterize them accurately. The recent development of LLMs has notably advanced natural language understanding, particularly in sense inference and reasoning. In this paper, we investigate the potential of LLMs in characterizing three types of semantic change: dimension, relation, and orientation. We achieve this by combining LLMs’ Chain-of-Thought with rhetorical devices and conducting an experimental assessment of our approach using newly created datasets. Our results highlight the effectiveness of LLMs in capturing and analyzing semantic changes, providing valuable insights to improve computational linguistic applications.
This paper presents a semi-supervised approach to investigating lexical semantic change in English prepositions using contextualized word embeddings from BERT. Due to their hybrid lexico-grammatical nature and high degree of polysemy, prepositions have received limited attention in computational studies of semantic change. We address this gap by first applying BERT-based embeddings in combination with a k-nearest neighbors classifier to the task of preposition sense disambiguation, achieving competitive performance without relying on external lexical resources. The trained model is then applied to diachronic data from the Corpus of Historical American English to analyze semantic change over time. By measuring classifier confidence and correlating it with usage year, we detect systematic differences between simple and compound prepositions. Our results confirm linguistic hypotheses that simple prepositions remain largely semantically stable, while compound prepositions exhibit measurable semantic change. The study demonstrates that BERT embeddings provide an effective tool for exploring diachronic semantic phenomena in functionally complex word classes and can be extended to other languages and datasets.
A Computational Analysis of the Emergence of Therapy-speak in Social Media
Alina Iacob | Ana Sabina Uban
Alina Iacob | Ana Sabina Uban
The present article investigates semantic change in psychology-related concepts, in scientific and social media texts comparatively. We assess patterns of change over 15 years (2010-2025) and compare word usage in a corpus of Psychology journals abstracts and Reddit comments, testing whether specialized communities on social media align with psychology experts. We analyze semantic breadth, semantic displacement and neighbours similarity evolutions, and in addition include in our experiments contextual embeddings alongside static Word2Vec embeddings. Our results reveal diverse patterns of semantic change across the examined concepts and confirm that many terms are used differently on social media compared to specialized literature. Furthermore, Reddit communities focused on psychology discussions occupy an intermediate position, adopting a more objective stance than general-domain threads while remaining distinct from specialized literature.
Lexical semantic change detection (LSCD) increasingly relies on contextualised language model embeddings, yet most approaches still quantify change using a small set of semantic change metrics, primarily Average Pairwise Distance (APD) and cosine distance over word prototypes (PRT). We introduce Average Minimum Distance (AMD) and Symmetric Average Minimum Distance (SAMD), new measures that quantify semantic change via local correspondence between word usages across time periods. Across multiple languages, encoder models, and representation spaces, we show that AMD often provides more robust performance, particularly under dimensionality reduction and with non-specialised encoders, while SAMD excels with specialised encoders. We suggest that LSCD may benefit from considering alternative semantic change metrics beyond APD and PRT, with AMD offering a robust option for contextualised embedding-based analysis.
From sunblock to softblock: Analyzing the correlates of neology in published writing and on social media
Maria Ryskina | Matthew R. Gormley | Kyle Mahowald | David R. Mortensen | Taylor Berg-Kirkpatrick | Vivek Kulkarni
Maria Ryskina | Matthew R. Gormley | Kyle Mahowald | David R. Mortensen | Taylor Berg-Kirkpatrick | Vivek Kulkarni
Living languages are shaped by a host of conflicting internal and external evolutionary pressures. While some of these pressures are universal across languages and cultures, others differ depending on the social and conversational context: language use in newspapers is subject to very different constraints than language use on social media. Prior distributional semantic work on English word emergence *(neology)* identified two factors correlated with creation of new words by analyzing a corpus consisting primarily of historical published texts [(Ryskina et al., 2020)](https://aclanthology.org/2020.scil-1.43/). Extending this methodology to contextual embeddings in addition to static ones and applying it to a new corpus of Twitter posts, we show that the same findings hold for both domains, though the topic popularity growth factor may contribute less to neology on Twitter than in published writing. We hypothesize that this difference can be explained by the two domains favouring different word formation mechanisms.