Gauvain Bourgne


2026

Outliers in dynamic topic modeling are often discarded as noise, yet some act as early signals of emerging topics. We introduce a temporal taxonomy of news document trajectories that distinguishes anticipatory outliers, documents that appear before a topic forms but later integrate into it, from those that reinforce existing topics or remain isolated. This taxonomy bridges weak-signal detection and dynamic topic modeling, clarifying how individual articles anticipate, initiate, or drift within evolving clusters. We implement it within a cumulative clustering framework using document- embeddings from eleven state-of-the-art language models and apply it retrospectively to HydroNewsFr, a French news corpus on the hydrogen economy curated for this study. Inter-model agreement on anticipatory outliers indicates that a small high-agreement subset yields robust confidence estimates. Complementary qualitative case studies further demonstrate their potential value as early indicators of emerging narratives. All reproducibility materials and results are available at https://anonymous.4open.science/status/lrec_from_noise_to_signal-B721.

2025