Abstract
Topic models are useful tools for analyzing and interpreting the main underlying themes of large corpora of text. Most topic models rely on word co-occurrence for computing a topic, i.e., a weighted set of words that together represent a high-level semantic concept. In this paper, we propose a new light-weight Self-Supervised Neural Topic Model (SNTM) that learns a rich context by learning a topic representation jointly from three co-occurring words and a document that the triple originates from. Our experimental results indicate that our proposed neural topic model, SNTM, outperforms previously existing topic models in coherence metrics as well as document clustering accuracy. Moreover, apart from the topic coherence and clustering performance, the proposed neural topic model has a number of advantages, namely, being computationally efficient and easy to train.- Anthology ID:
- 2021.findings-emnlp.284
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2021
- Month:
- November
- Year:
- 2021
- Address:
- Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- Findings
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3341–3350
- Language:
- URL:
- https://aclanthology.org/2021.findings-emnlp.284
- DOI:
- 10.18653/v1/2021.findings-emnlp.284
- Cite (ACL):
- Seyed Ali Bahrainian, Martin Jaggi, and Carsten Eickhoff. 2021. Self-Supervised Neural Topic Modeling. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3341–3350, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Self-Supervised Neural Topic Modeling (Bahrainian et al., Findings 2021)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2021.findings-emnlp.284.pdf
- Code
- ali-bahrainian/sntm