Abstract
Topic modeling and word embedding are two important techniques for deriving latent semantics from data. General-purpose topic models typically work in coarse granularity by capturing word co-occurrence at the document/sentence level. In contrast, word embedding models usually work in much finer granularity by modeling word co-occurrence within small sliding windows. With the aim of deriving latent semantics by considering word co-occurrence at different levels of granularity, we propose a novel model named Latent Topic Embedding (LTE), which seamlessly integrates topic generation and embedding learning in one unified framework. We further propose an efficient Monte Carlo EM algorithm to estimate the parameters of interest. By retaining the individual advantages of topic modeling and word embedding, LTE results in better latent topics and word embedding. Extensive experiments verify the superiority of LTE over the state-of-the-arts.- Anthology ID:
- C16-1253
- Volume:
- Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Editors:
- Yuji Matsumoto, Rashmi Prasad
- Venue:
- COLING
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 2689–2698
- Language:
- URL:
- https://aclanthology.org/C16-1253
- DOI:
- Cite (ACL):
- Di Jiang, Lei Shi, Rongzhong Lian, and Hua Wu. 2016. Latent Topic Embedding. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2689–2698, Osaka, Japan. The COLING 2016 Organizing Committee.
- Cite (Informal):
- Latent Topic Embedding (Jiang et al., COLING 2016)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/C16-1253.pdf