Abstract
Despite great scalability on large data and their ability to understand correlations between topics, spectral topic models have not been widely used due to the absence of reliability in real data and lack of practical implementations. This paper aims to solidify the foundations of spectral topic inference and provide a practical implementation for anchor-based topic modeling. Beginning with vocabulary curation, we scrutinize every single inference step with other viable options. We also evaluate our matrix-based approach against popular alternatives including a tensor-based spectral method as well as probabilistic algorithms. Our quantitative and qualitative experiments demonstrate the power of Rectified Anchor Word algorithm in various real datasets, providing a complete guide to practical correlated topic modeling.- Anthology ID:
- D19-1504
- Volume:
- Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong, China
- Editors:
- Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
- Venues:
- EMNLP | IJCNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4991–5001
- Language:
- URL:
- https://aclanthology.org/D19-1504
- DOI:
- 10.18653/v1/D19-1504
- Cite (ACL):
- Moontae Lee, Sungjun Cho, David Bindel, and David Mimno. 2019. Practical Correlated Topic Modeling and Analysis via the Rectified Anchor Word Algorithm. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4991–5001, Hong Kong, China. Association for Computational Linguistics.
- Cite (Informal):
- Practical Correlated Topic Modeling and Analysis via the Rectified Anchor Word Algorithm (Lee et al., EMNLP-IJCNLP 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/D19-1504.pdf