Modeling Text using the Continuous Space Topic Model with Pre-Trained Word Embeddings

Seiichi Inoue; Taichi Aida; Mamoru Komachi; Manabu Asai

doi:10.18653/v1/2021.acl-srw.15

Modeling Text using the Continuous Space Topic Model with Pre-Trained Word Embeddings

Seiichi Inoue, Taichi Aida, Mamoru Komachi, Manabu Asai

Abstract

In this study, we propose a model that extends the continuous space topic model (CSTM), which flexibly controls word probability in a document, using pre-trained word embeddings. To develop the proposed model, we pre-train word embeddings, which capture the semantics of words and plug them into the CSTM. Intrinsic experimental results show that the proposed model exhibits a superior performance over the CSTM in terms of perplexity and convergence speed. Furthermore, extrinsic experimental results show that the proposed model is useful for a document classification task when compared with the baseline model. We qualitatively show that the latent coordinates obtained by training the proposed model are better than those of the baseline model.

Anthology ID:: 2021.acl-srw.15
Volume:: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop
Month:: August
Year:: 2021
Address:: Online
Editors:: Jad Kabbara, Haitao Lin, Amandalynne Paullada, Jannis Vamvas
Venues:: ACL | IJCNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 138–147
Language:
URL:: https://aclanthology.org/2021.acl-srw.15
DOI:: 10.18653/v1/2021.acl-srw.15
Bibkey:
Cite (ACL):: Seiichi Inoue, Taichi Aida, Mamoru Komachi, and Manabu Asai. 2021. Modeling Text using the Continuous Space Topic Model with Pre-Trained Word Embeddings. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop, pages 138–147, Online. Association for Computational Linguistics.
Cite (Informal):: Modeling Text using the Continuous Space Topic Model with Pre-Trained Word Embeddings (Inoue et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-3/2021.acl-srw.15.pdf

PDF Search