Fuzzy V-Measure - An Evaluation Method for Cluster Analyses of Ambiguous Data

Jason Utt, Sylvia Springorum, Maximilian Köper, Sabine Schulte im Walde


Abstract
This paper discusses an extension of the V-measure (Rosenberg and Hirschberg, 2007), an entropy-based cluster evaluation metric. While the original work focused on evaluating hard clusterings, we introduce the Fuzzy V-measure which can be used on data that is inherently ambiguous. We perform multiple analyses varying the sizes and ambiguity rates and show that while entropy-based measures in general tend to suffer when ambiguity increases, a measure with desirable properties can be derived from these in a straightforward manner.
Anthology ID:
L14-1639
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
581–587
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/829_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Jason Utt, Sylvia Springorum, Maximilian Köper, and Sabine Schulte im Walde. 2014. Fuzzy V-Measure - An Evaluation Method for Cluster Analyses of Ambiguous Data. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 581–587, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Fuzzy V-Measure - An Evaluation Method for Cluster Analyses of Ambiguous Data (Utt et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/829_Paper.pdf