Danijela Horak


Topic Modeling With Topological Data Analysis
Ciarán Byrne | Danijela Horak | Karo Moilanen | Amandla Mabona
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Recent unsupervised topic modelling ap-proaches that use clustering techniques onword, token or document embeddings can ex-tract coherent topics. A common limitationof such approaches is that they reveal noth-ing about inter-topic relationships which areessential in many real-world application do-mains. We present an unsupervised topic mod-elling method which harnesses TopologicalData Analysis (TDA) to extract a topologicalskeleton of the manifold upon which contextu-alised word embeddings lie. We demonstratethat our approach, which performs on par witha recent baseline, is able to construct a networkof coherent topics together with meaningfulrelationships between them.