========================================================================
README file for the EMNLP-2012 annotation of the ECB corpus

Release: 1.0 (03/26/2012)
Created: March 26, 2012
========================================================================

---- OVERVIEW ----

This README.txt file accompanies the release of the EMNLP-2012 stand-off
annotation on top of the EventCorefBank (ECB) corpus (Bejan and
Harabagiu, 2010). This annotation includes entity and event
coreference relations within and across documents.  The major three
differences with respect to the original ECB annotation are: 
(i) the annotated sentences are *fully* annotated, 
(ii) only coreferent relations between events are annotated, and
(iii) coreference relations between entities are annotated, following 
the OntoNotes guidelines for coreference annotation.


---- FORMAT ----

The distribution is in the form of stand-off annotations for the ECB
documents. The ECB texts are available at:
http://faculty.washington.edu/bejan/data/ECB1.0.tar.gz
The script run.py will automatically download the ECB documents and
put the annotations together with the text.

It creates a folder EECB1.0/, and stores the output files in the folder.
Each folder under EECB1.0/data/ contains several documents 
covering one topic. Entity mentions are marked with an <ENTITY>
tag. Event mentions are marked with an <EVENT> tag. Every entity and
event mention includes a COREFID attribute whose numerical value
identifies the coreference chain. Coreferent mentions share the same
COREFID.
Note: An asterisk in the COREFID value means that it is part of a
split mention, and belongs to the previous mention (e.g., "into" goes
with "checked" in "checked herself into").


---- CORPUS STATISTICS ---

topics: 43
documents: 482
entities: 1068
entity mentions: 5447
events: 774
event mentions: 2533


---- LICENSE ----

This annotation is made available under the Public Domain Dedication
and License (PDDL) v1.0 whose full text can be found at:
http://www.opendatacommons.org/licenses/pddl/1.0/


---- REFERENCES ----

Cosmin Bejan and Sanda Harabagiu. 2010. Unsupervised Event Coreference
Resolution with Rich Linguistic Features. In Proceedings of ACL 2010,
pages 1412–1422.

Heeyoung Lee, Marta Recasens, Angel Chang, Mihai Surdeanu and Dan 
Jurafsky. 2012. Joint Entity and Event Coreference Resolution across 
Documents. In Proceedings of EMNLP 2012.

---- CONTENTS ----

   * mentions.txt 
     The stand-off annotations of ECB files.
     
   * a script, run.py 
     This script downloads the ECB text and merges it with the stand-off 
     annotations, giving as output the annotated files. It creates a
     folder EECB1.0/, and stores the output files in the folder. Tested
     only on Linux.

   * the annotation guidelines, event_coref_guidelines.pdf 
     This document describes the annotation guidelines that were 
     followed to create this annotation.

   * LICENSE.pdf
     The PDDL license v1.0 under which this annotation is made
     available.
     
========================================================================
