Jean-Luc Verschelde


2008

pdf
A Coreference Corpus and Resolution System for Dutch
Iris Hendrickx | Gosse Bouma | Frederik Coppens | Walter Daelemans | Veronique Hoste | Geert Kloosterman | Anne-Marie Mineur | Joeri Van Der Vloet | Jean-Luc Verschelde
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present the main outcomes of the COREA project: a corpus annotated with coreferential relations and a coreference resolution system for Dutch. In the project we developed annotation guidelines for coreference resolution for Dutch and annotated a corpus of 135K tokens. We discuss these guidelines, the annotation tool, and the inter-annotator agreement. We also show a visualization of the annotated relations. The standard approach to evaluate a coreference resolution system is to compare the predictions of the system to a hand-annotated gold standard test set (cross-validation). A more practically oriented evaluation is to test the usefulness of coreference relation information in an NLP application. We run experiments with an Information Extraction module for the medical domain, and measure the performance of this module with and without the coreference relation information. We present the results of both this application-oriented evaluation of our system and of a standard cross-validation evaluation. In a separate experiment we also evaluate the effect of coreference information produced by a simple rule-based coreference module in a Question Answering application.