Predicting Topic (Co-)Occurrence Using Topic Networks Built from the Project Gutenberg Corpus

Bhuvanesh Verma, Alexander Mehler


Abstract
Although temporal topic modeling has been widely applied to scientific and legal texts, literary corpora have largely been overlooked in this regard. To address this issue, we analyze topic evolution in a subset of the Project Gutenberg (PG) corpus. We model this subset as a sequence of topic networks that capture the emergence, persistence, and interaction of thematic structures over decades. Using supervised topic representations, we predict nodes (topics) and edges (topic pairings) to forecast future topics and their co-occurrence. Our experiments demonstrate moderate to strong temporal persistence in topic connectivity patterns across three topic systems, with ROC-AUC and AP values consistently above 0.85. We find that the temporal span of topic networks significantly impacts predictive performance: longer spans improve the stability and recall of topic presence, while shorter spans better capture evolving topic relationships. Overall, our findings demonstrate the predictability of topics in literary texts over time.
Anthology ID:
2026.lrec-main.65
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
860–869
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.65/
DOI:
Bibkey:
Cite (ACL):
Bhuvanesh Verma and Alexander Mehler. 2026. Predicting Topic (Co-)Occurrence Using Topic Networks Built from the Project Gutenberg Corpus. International Conference on Language Resources and Evaluation, main:860–869.
Cite (Informal):
Predicting Topic (Co-)Occurrence Using Topic Networks Built from the Project Gutenberg Corpus (Verma & Mehler, LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.65.pdf