Evaluation of Unsupervised Information Extraction

Wei Wang, Romaric Besançon, Olivier Ferret, Brigitte Grau


Abstract
Unsupervised methods gain more and more attention nowadays in information extraction area, which allows to design more open extraction systems. In the domain of unsupervised information extraction, clustering methods are of particular importance. However, evaluating the results of clustering remains difficult at a large scale, especially in the absence of reliable reference. On the basis of our experiments on unsupervised relation extraction, we first discuss in this article how to evaluate clustering quality without a reference by relying on internal measures. Then we propose a method, supported by a dedicated annotation tool, for building a set of reference clusters of relations from a corpus. Moreover, we apply it to our experimental framework and illustrate in this way how to build a significant reference for unsupervised relation extraction, more precisely made of 80 clusters gathering more than 4,000 relation instances, in a short time. Finally, we present how such reference is exploited for the evaluation of clustering with external measures and analyze the results of the application of these measures to the clusters of relations produced by our unsupervised relation extraction system.
Anthology ID:
L12-1313
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
552–558
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/553_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Wei Wang, Romaric Besançon, Olivier Ferret, and Brigitte Grau. 2012. Evaluation of Unsupervised Information Extraction. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 552–558, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Evaluation of Unsupervised Information Extraction (Wang et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/553_Paper.pdf