Saskia Peels


2026

We present DRiFT (Debates on Reddit involving Food Transition), a new large-scale corpus and set of computational methods for using language as an early indicator of social change in the protein transition, i.e., the shift from a diet predominantly based on animal proteins to one based mainly on plant sources. DRiFT comprises 17.5M Reddit comments (2010–2022) from 29 subreddits grouped into two speaker communities: SUSTAINABLE (early adopters/innovators) and GENERIC (general public). Building on neologism analysis, lexical semantic change detection, and connotative profiling, we introduce three linguistic measures of innovation awareness, meaning shift, and attitudinal valence. We extract neonyms and retronyms to quantify awareness; apply static and contextual embedding-based Lexical Semantic Change methods (PPMI, SGNS, BERT substitutions) to probe semantic reconceptualization; and adapt an embedding-based connotation hyperplane to measure polarity changes for targeted terms. Results show marked diastratic differences, with SUSTAINABLE users both using innovation-specific lexicon more frequently and having reconceptualized core food terms in ethical/environmental frames, while the GENERIC community exhibits rapid proportional growth in neologism use and emerging positive connotations for some plant-based products. Diachronic denotational shifts over the 12-year window are weak, suggesting shortcoming of embedding-based methods to capture subtle meaning changes. DRiFT and our analyses demonstrate that language can function as a sensitive "thermometer" of subtle social change, revealing attitudinal dynamics before observable behavioral shifts.

2023

We evaluate four count-based and predictive distributional semantic models of Ancient Greek against AGREE, a composite benchmark of human judgements, to assess their ability to retrieve semantic relatedness. On the basis of the observations deriving from the analysis of the results, we design a procedure for a larger-scale intrinsic evaluation of count-based and predictive language models, including syntactic embeddings. We also propose possible ways of exploiting the different layers of the whole AGREE benchmark (including both human- and machine-generated data) and different evaluation metrics.