Annotating Near-Identity from Coreference Disagreements

Marta Recasens, M. Antònia Martí, Constantin Orasan


Abstract
We present an extension of the coreference annotation in the English NP4E and the Catalan AnCora-CA corpora with near-identity relations, which are borderline cases of coreference. The annotated subcorpora have 50K tokens each. Near-identity relations, as presented by Recasens et al. (2010; 2011), build upon the idea that identity is a continuum rather than an either/or relation, thus introducing a middle ground category to explain currently problematic cases. The first annotation effort that we describe shows that it is not possible to annotate near-identity explicitly because subjects are not fully aware of it. Therefore, our second annotation effort used an indirect method, and arrived at near-identity annotations by inference from the disagreements between five annotators who had only a two-alternative choice between coreference and non-coreference. The results show that whereas as little as 2-6% of the relations were explicitly annotated as near-identity in the former effort, up to 12-16% of the relations turned out to be near-identical following the indirect method of the latter effort.
Anthology ID:
L12-1391
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
165–172
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/674_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Marta Recasens, M. Antònia Martí, and Constantin Orasan. 2012. Annotating Near-Identity from Coreference Disagreements. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 165–172, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Annotating Near-Identity from Coreference Disagreements (Recasens et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/674_Paper.pdf