Distributed Document and Phrase Co-embeddings for Descriptive Clustering
Motoki Sato, Austin J. Brockmeier, Georgios Kontonatsios, Tingting Mu, John Y. Goulermas, Jun’ichi Tsujii, Sophia Ananiadou
Abstract
Descriptive document clustering aims to automatically discover groups of semantically related documents and to assign a meaningful label to characterise the content of each cluster. In this paper, we present a descriptive clustering approach that employs a distributed representation model, namely the paragraph vector model, to capture semantic similarities between documents and phrases. The proposed method uses a joint representation of phrases and documents (i.e., a co-embedding) to automatically select a descriptive phrase that best represents each document cluster. We evaluate our method by comparing its performance to an existing state-of-the-art descriptive clustering method that also uses co-embedding but relies on a bag-of-words representation. Results obtained on benchmark datasets demonstrate that the paragraph vector-based method obtains superior performance over the existing approach in both identifying clusters and assigning appropriate descriptive labels to them.- Anthology ID:
- E17-1093
- Volume:
- Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers
- Month:
- April
- Year:
- 2017
- Address:
- Valencia, Spain
- Editors:
- Mirella Lapata, Phil Blunsom, Alexander Koller
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 991–1001
- Language:
- URL:
- https://aclanthology.org/E17-1093
- DOI:
- Cite (ACL):
- Motoki Sato, Austin J. Brockmeier, Georgios Kontonatsios, Tingting Mu, John Y. Goulermas, Jun’ichi Tsujii, and Sophia Ananiadou. 2017. Distributed Document and Phrase Co-embeddings for Descriptive Clustering. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 991–1001, Valencia, Spain. Association for Computational Linguistics.
- Cite (Informal):
- Distributed Document and Phrase Co-embeddings for Descriptive Clustering (Sato et al., EACL 2017)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/E17-1093.pdf