Improving Deep Embedded Clustering via Learning Cluster-level Representations

Qing Yin, Zhihua Wang, Yunya Song, Yida Xu, Shuai Niu, Liang Bai, Yike Guo, Xian Yang


Abstract
Driven by recent advances in neural networks, various Deep Embedding Clustering (DEC) based short text clustering models are being developed. In these works, latent representation learning and text clustering are performed simultaneously. Although these methods are becoming increasingly popular, they use pure cluster-oriented objectives, which can produce meaningless representations. To alleviate this problem, several improvements have been developed to introduce additional learning objectives in the clustering process, such as models based on contrastive learning. However, existing efforts rely heavily on learning meaningful representations at the instance level. They have limited focus on learning global representations, which are necessary to capture the overall data structure at the cluster level. In this paper, we propose a novel DEC model, which we named the deep embedded clustering model with cluster-level representation learning (DECCRL) to jointly learn cluster and instance level representations. Here, we extend the embedded topic modelling approach to introduce reconstruction constraints to help learn cluster-level representations. Experimental results on real-world short text datasets demonstrate that our model produces meaningful clusters.
Anthology ID:
2022.coling-1.195
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
2226–2236
Language:
URL:
https://aclanthology.org/2022.coling-1.195
DOI:
Bibkey:
Cite (ACL):
Qing Yin, Zhihua Wang, Yunya Song, Yida Xu, Shuai Niu, Liang Bai, Yike Guo, and Xian Yang. 2022. Improving Deep Embedded Clustering via Learning Cluster-level Representations. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2226–2236, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Improving Deep Embedded Clustering via Learning Cluster-level Representations (Yin et al., COLING 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.coling-1.195.pdf