Jinjin Chi


2016

pdf
Integrating Topic Modeling with Word Embeddings by Mixtures of vMFs
Ximing Li | Jinjin Chi | Changchun Li | Jihong Ouyang | Bo Fu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Gaussian LDA integrates topic modeling with word embeddings by replacing discrete topic distribution over word types with multivariate Gaussian distribution on the embedding space. This can take semantic information of words into account. However, the Euclidean similarity used in Gaussian topics is not an optimal semantic measure for word embeddings. Acknowledgedly, the cosine similarity better describes the semantic relatedness between word embeddings. To employ the cosine measure and capture complex topic structure, we use von Mises-Fisher (vMF) mixture models to represent topics, and then develop a novel mix-vMF topic model (MvTM). Using public pre-trained word embeddings, we evaluate MvTM on three real-world data sets. Experimental results show that our model can discover more coherent topics than the state-of-the-art baseline models, and achieve competitive classification performance.