Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge

Ryan J. Gallagher, Kyle Reing, David Kale, Greg Ver Steeg


Abstract
While generative models such as Latent Dirichlet Allocation (LDA) have proven fruitful in topic modeling, they often require detailed assumptions and careful specification of hyperparameters. Such model complexity issues only compound when trying to generalize generative models to incorporate human input. We introduce Correlation Explanation (CorEx), an alternative approach to topic modeling that does not assume an underlying generative model, and instead learns maximally informative topics through an information-theoretic framework. This framework naturally generalizes to hierarchical and semi-supervised extensions with no additional modeling assumptions. In particular, word-level domain knowledge can be flexibly incorporated within CorEx through anchor words, allowing topic separability and representation to be promoted with minimal human intervention. Across a variety of datasets, metrics, and experiments, we demonstrate that CorEx produces topics that are comparable in quality to those produced by unsupervised and semi-supervised variants of LDA.
Anthology ID:
Q17-1037
Volume:
Transactions of the Association for Computational Linguistics, Volume 5
Month:
Year:
2017
Address:
Cambridge, MA
Editors:
Lillian Lee, Mark Johnson, Kristina Toutanova
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
529–542
Language:
URL:
https://aclanthology.org/Q17-1037
DOI:
10.1162/tacl_a_00078
Bibkey:
Cite (ACL):
Ryan J. Gallagher, Kyle Reing, David Kale, and Greg Ver Steeg. 2017. Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge. Transactions of the Association for Computational Linguistics, 5:529–542.
Cite (Informal):
Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge (Gallagher et al., TACL 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/Q17-1037.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-1/Q17-1037.mp4
Code
 gregversteeg/corex_topic