Tethering Broken Themes: Aligning Neural Topic Models with Labels and Authors

Mayank Nagda, Phil Ostheimer, Sophie Fellenz


Abstract
Topic models are a popular approach for extracting semantic information from large document collections. However, recent studies suggest that the topics generated by these models often do not align well with human intentions. Although metadata such as labels and authorship information are available, it has not yet been effectively incorporated into neural topic models. To address this gap, we introduce FANToM, a novel method to align neural topic models with both labels and authorship information. FANToM allows for the inclusion of this metadata when available, producing interpretable topics and author distributions for each topic. Our approach demonstrates greater expressiveness than conventional topic models by learning the alignment between labels, topics, and authors. Experimental results show that FANToM improves existing models in terms of both topic quality and alignment. Additionally, it identifies author interests and similarities.
Anthology ID:
2025.findings-naacl.44
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
740–760
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.44/
DOI:
Bibkey:
Cite (ACL):
Mayank Nagda, Phil Ostheimer, and Sophie Fellenz. 2025. Tethering Broken Themes: Aligning Neural Topic Models with Labels and Authors. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 740–760, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Tethering Broken Themes: Aligning Neural Topic Models with Labels and Authors (Nagda et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.44.pdf