C-3: Coherence and Coreference Corpus

Cristina Nicolae, Gabriel Nicolae, Kirk Roberts


Abstract
The phenomenon of coreference, covering entities, their mentions and their properties, is intricately linked to the phenomenon of coherence, covering the structure of rhetorical relations in a discourse. A text corpus that has both phenomena annotated can be used to test hypotheses about their interrelation or to detect other phenomena. We present the process by which C-3, a new corpus, was obtained by annotating the Discourse GraphBank coherence corpus with entity and mention information. The annotation followed a set of ACE guidelines adapted to favor coreference and to include entities of unknown types in the annotation. Together with the corpus we offer a new annotation tool specifically designed to annotate entity and mention information within a simple and functional graphical interface that combines the “best of all worlds” from available annotation tools. The potential usefulness of C-3 is discussed, as well as an application in which the corpus proved to be a valuable resource.
Anthology ID:
L10-1428
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/622_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Cristina Nicolae, Gabriel Nicolae, and Kirk Roberts. 2010. C-3: Coherence and Coreference Corpus. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
C-3: Coherence and Coreference Corpus (Nicolae et al., LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/622_Paper.pdf