This resource was built over the RTE-6 dataset (Luisa Bentivogli, Peter Clark, Ido Dagan, Hoa Trang Dang, and Danilo Giampiccolo. 2010. The sixth pascal recognizing textual entailment challenge. In Proccidings of TAC).
The resource is free to use and redistribute. We kindly ask researches who use this resource in their research to include a reference to our paper in their articles.
Due to licensing issues, we are not allowed to release the 2010 RTE corpus itself. The user has to download the corpus from http://www.nist.gov/tac/data/index.html, after filling the required user-agreement.
For convenience, the resource is provided in two formats: (1) an Excel file, (2) an XML file.
The contents of both files are identical.
Each row in the Excel file, as well as each "instance" element in the XML file, is an instance.
Most of the Excel columns (and attributes/elements in the XML file) define the instance,
except for "entailment-annotation" column (which is the "entailment" attribute in the XML file) which are the annotations of the RTE annotators (see the paper referenced above),
and the last two columns ("implicit-argument annotation" and "other argument corefers") (or "annotation" element in the XML file), which are our annotations.
The Excel file has the following columns (which appear also as attributes/elements in the XML file):
dataset-name - The RTE dataset. This might take either RTE6-DEV or RTE6-TEST
entailment-annotation - Whether the instance was extracted from a pair of {document-sentence,hypothesis} in which the sentence entails the hypothesis, or not.
uuid - A unique identifier of the instance.
topic-id - The topic-id in the RTE corpus.
document-id - The document-id in the RTE corpus.
sentence-number - The number of the sentence (in the document), where the predicate appears.
hypothesis-id - The hypothesis-id in the RTE corpus.
predicate-word - The predicate word, as appears in the hypothesis.
predicate-lemma - The predicate lemma (given by WordNet lemmatizer, embedded in EasyFirst parser).
argument-word - The argument word, as appears in the hypothesis.
argument-lemma - The argument lemma (given by WordNet lemmatizer, embedded in EasyFirst parser).
argument-phrase in hypothesis - The argument phrase, as appears in the hypothesis.
argument-type - The relation of the argument to the predicate in the hypothesis. Can be either SUBJECT, OBJECT, MODIFIER, or UNKNOWN.
argument sentence-number in document - The number of sentence where the candidate-argument-filler appears in the document.
implicit-argument annotation - Our annotation = Is the relation of the argument to the predicate in the hypothesis, holds also (implicitly) in the document.
other argument corefers - An additional annotation we added: Does another (explicit) argument of the predicate exist in the document, which co-refers with the given candidate-argument-filler (according to human-judgment).