Abstract
We propose a new problem called coordinated topic modeling that imitates human behavior while describing a text corpus. It considers a set of well-defined topics like the axes of a semantic space with a reference representation. It then uses the axes to model a corpus for easily understandable representation. This new task helps represent a corpus more interpretably by reusing existing knowledge and benefits the corpora comparison task. We design ECTM, an embedding-based coordinated topic model that effectively uses the reference representation to capture the target corpus-specific aspects while maintaining each topic’s global semantics. In ECTM, we introduce the topic- and document-level supervision with a self-training mechanism to solve the problem. Finally, extensive experiments on multiple domains show the superiority of our model over other baselines.- Anthology ID:
- 2022.emnlp-main.668
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 9831–9843
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-main.668
- DOI:
- 10.18653/v1/2022.emnlp-main.668
- Cite (ACL):
- Pritom Saha Akash, Jie Huang, and Kevin Chen-Chuan Chang. 2022. Coordinated Topic Modeling. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9831–9843, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- Coordinated Topic Modeling (Akash et al., EMNLP 2022)
- PDF:
- https://preview.aclanthology.org/emnlp-22-attachments/2022.emnlp-main.668.pdf