Constrained Non-negative Matrix Factorization for Guided Topic Modeling of Minority Topics

Seyedeh Fatemeh Ebrahimi, Jaakko Peltonen


Abstract
Topic models often fail to capture low-prevalence, domain-critical themes—so-called minority topics—such as mental health themes in online comments. While some existing methods can incorporate domain knowledge such as expected topical content, methods allowing guidance may require overly detailed expected topics, hindering the discovery of topic divisions and variation. We propose a topic modeling solution via a specially constrained NMF. We incorporate a seed word list characterizing minority content of interest, but we do not require experts to pre-specify their division across minority topics. Through prevalence constraints on minority topics and seed word content across topics, we learn distinct data-driven minority topics as well as majority topics. The constrained NMF is fitted via Karush-Kuhn-Tucker (KKT) conditions with multiplicative updates. We outperform several baselines on synthetic data in terms of topic purity, normalized mutual information, and also evaluate topic quality using Jensen-Shannon divergence (JSD). We conduct a case study on YouTube vlog comments, analyzing viewer discussion of mental health content; our model successfully identifies and reveals this domain relevant minority content.
Anthology ID:
2025.emnlp-main.1802
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
35561–35586
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1802/
DOI:
Bibkey:
Cite (ACL):
Seyedeh Fatemeh Ebrahimi and Jaakko Peltonen. 2025. Constrained Non-negative Matrix Factorization for Guided Topic Modeling of Minority Topics. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 35561–35586, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Constrained Non-negative Matrix Factorization for Guided Topic Modeling of Minority Topics (Ebrahimi & Peltonen, EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1802.pdf
Checklist:
 2025.emnlp-main.1802.checklist.pdf