Finding Scientific Topics in Continuously Growing Text Corpora

André Bittermann, Jonas Rieger


Abstract
The ever growing amount of research publications demands computational assistance for everyone trying to keep track with scientific processes. Topic modeling has become a popular approach for finding scientific topics in static collections of research papers. However, the reality of continuously growing corpora of scholarly documents poses a major challenge for traditional approaches. We introduce RollingLDA for an ongoing monitoring of research topics, which offers the possibility of sequential modeling of dynamically growing corpora with time consistency of time series resulting from the modeled texts. We evaluate its capability to detect research topics and present a Shiny App as an easy-to-use interface. In addition, we illustrate usage scenarios for different user groups such as researchers, students, journalists, or policy-makers.
Anthology ID:
2022.sdp-1.2
Volume:
Proceedings of the Third Workshop on Scholarly Document Processing
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
sdp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7–18
Language:
URL:
https://aclanthology.org/2022.sdp-1.2
DOI:
Bibkey:
Cite (ACL):
André Bittermann and Jonas Rieger. 2022. Finding Scientific Topics in Continuously Growing Text Corpora. In Proceedings of the Third Workshop on Scholarly Document Processing, pages 7–18, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
Finding Scientific Topics in Continuously Growing Text Corpora (Bittermann & Rieger, sdp 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.sdp-1.2.pdf
Code
 leibniz-psychology/psychtopics +  additional community code