John Y. Goulermas
2017
Distributed Document and Phrase Co-embeddings for Descriptive Clustering
Motoki Sato
|
Austin J. Brockmeier
|
Georgios Kontonatsios
|
Tingting Mu
|
John Y. Goulermas
|
Jun’ichi Tsujii
|
Sophia Ananiadou
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers
Descriptive document clustering aims to automatically discover groups of semantically related documents and to assign a meaningful label to characterise the content of each cluster. In this paper, we present a descriptive clustering approach that employs a distributed representation model, namely the paragraph vector model, to capture semantic similarities between documents and phrases. The proposed method uses a joint representation of phrases and documents (i.e., a co-embedding) to automatically select a descriptive phrase that best represents each document cluster. We evaluate our method by comparing its performance to an existing state-of-the-art descriptive clustering method that also uses co-embedding but relies on a bag-of-words representation. Results obtained on benchmark datasets demonstrate that the paragraph vector-based method obtains superior performance over the existing approach in both identifying clusters and assigning appropriate descriptive labels to them.
Search
Co-authors
- Motoki Sato 1
- Austin J. Brockmeier 1
- Georgios Kontonatsios 1
- Tingting Mu 1
- Jun’ichi Tsujii 1
- show all...
Venues
- eacl1