Evangelia Zve

2026

From Noise to Signal: When Outliers Seed New Topics
Evangelia Zve | Gauvain Bourgne | Benjamin Icard | Jean-Gabriel Ganascia
Proceedings of the Fifteenth Language Resources and Evaluation Conference

Outliers in dynamic topic modeling are often discarded as noise, yet some act as early signals of emerging topics. We introduce a temporal taxonomy of news document trajectories that distinguishes anticipatory outliers, documents that appear before a topic forms but later integrate into it, from those that reinforce existing topics or remain isolated. This taxonomy bridges weak-signal detection and dynamic topic modeling, clarifying how individual articles anticipate, initiate, or drift within evolving clusters. We implement it within a cumulative clustering framework using document- embeddings from eleven state-of-the-art language models and apply it retrospectively to HydroNewsFr, a French news corpus on the hydrogen economy curated for this study. Inter-model agreement on anticipatory outliers indicates that a small high-agreement subset yields robust confidence estimates. Complementary qualitative case studies further demonstrate their potential value as early indicators of emerging narratives. All reproducibility materials and results are available at https://anonymous.4open.science/status/lrec_from_noise_to_signal-B721.

2025

pdf bib abs

Représenter le style au-delà des thématiques : une étude d’impact sur la dispersion vectorielle de différents modèles de langage
Benjamin Icard | Evangelia Zve | Lila Sainero | Alice Breton | Jean-Gabriel Ganascia
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 2 : traductions d'articles publiés

Cet article vise à étudier comment le style d’écriture influence la dispersion des plongements vectoriels de divers grands modèles de langage. Alors que les premiers modèles de type transformeur étaient prin- cipalement axés sur la modélisation thématique, cette étude examine le rôle du style d’écriture dans la configuration de l’espace vectoriel. À partir d’un corpus littéraire faisant varier thématiques et styles, nous comparons la sensibilité des modèles de langage en français et en anglais. En analysant ainsi l’impact spécifique du style sur la dispersion vectorielle, nous cherchons à mieux comprendre com- ment les modèles de langage traitent l’information stylistique, contribuant ainsi à leur interprétabilité globale. Ceci est un résumé de l’article “Embedding Style Beyond Topics: Analyzing Dispersion Effects Across Different Language Models” publié dans les actes de la conférence COLING 2025 (Icard et al., 2025) et accessible à l’URL : https://aclanthology.org/2025.coling-main.236/.

pdf bib abs

Embedding Style Beyond Topics: Analyzing Dispersion Effects Across Different Language Models
Benjamin Icard | Evangelia Zve | Lila Sainero | Alice Breton | Jean-Gabriel Ganascia
Proceedings of the 31st International Conference on Computational Linguistics

This paper analyzes how writing style affects the dispersion of embedding vectors across multiple, state-of-the-art language models. While early transformer models primarily aligned with topic modeling, this study examines the role of writing style in shaping embedding spaces. Using a literary corpus that alternates between topics and styles, we compare the sensitivity of language models across French and English. By analyzing the particular impact of style on embedding dispersion, we aim to better understand how language models process stylistic information, contributing to their overall interpretability.

pdf bib

From Outliers to Topics in Language Models: Anticipating Trends in News Corpora
Evangelia Zve | Benjamin Icard | Alice Breton | Lila Sainero | Gauvain Bourgne | Jean-Gabriel Ganascia
Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP-2025)

Co-authors

Venues

Fix author