From Noise to Signal: When Outliers Seed New Topics
Evangelia Zve, Gauvain Bourgne, Benjamin Icard, Jean-Gabriel Ganascia
Abstract
Outliers in dynamic topic modeling are often discarded as noise, yet some act as early signals of emerging topics. We introduce a temporal taxonomy of news document trajectories that distinguishes anticipatory outliers, documents that appear before a topic forms but later integrate into it, from those that reinforce existing topics or remain isolated. This taxonomy bridges weak-signal detection and dynamic topic modeling, clarifying how individual articles anticipate, initiate, or drift within evolving clusters. We implement it within a cumulative clustering framework using document- embeddings from eleven state-of-the-art language models and apply it retrospectively to HydroNewsFr, a French news corpus on the hydrogen economy curated for this study. Inter-model agreement on anticipatory outliers indicates that a small high-agreement subset yields robust confidence estimates. Complementary qualitative case studies further demonstrate their potential value as early indicators of emerging narratives. All reproducibility materials and results are available at https://anonymous.4open.science/status/lrec_from_noise_to_signal-B721.- Anthology ID:
- 2026.lrec-main.596
- Volume:
- Proceedings of the Fifteenth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2026
- Address:
- Palma de Mallorca, Spain
- Editors:
- Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
- Venue:
- LREC
- SIG:
- Publisher:
- ELRA Language Resource Association
- Note:
- Pages:
- 7523–7533
- Language:
- URL:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.596/
- DOI:
- Cite (ACL):
- Evangelia Zve, Gauvain Bourgne, Benjamin Icard, and Jean-Gabriel Ganascia. 2026. From Noise to Signal: When Outliers Seed New Topics. International Conference on Language Resources and Evaluation, main:7523–7533.
- Cite (Informal):
- From Noise to Signal: When Outliers Seed New Topics (Zve et al., LREC 2026)
- PDF:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.596.pdf