This file provides the details for all the data files provided as part
of the submission.

There are two kinds of data files in this tarball:
 - SET A: The first set of data files consists of files that were directly
   used in the submission for analysis. These are the files pertaining
   to the extraneous preposition error detection task. The evaluation
   code we have provided will take these files as input.

 - SET B: The second set contains files pertaining to three other grammatical
   error detection tasks: article errors, confused preposition errors
   and collocation errors. These files also contain the expert and crowd 
   judgments and the only difference between this set of files and set 1 
   above is that these  files do not contain any system output or predictions 
   whereas the extraneous preposition data files also contain predictions 
   from two different systems for each instance. Therefore, the evaluation 
   code we have provided will NOT work with these files. We are only providing 
   them in order to share them with the community so as to have a starting 
   point for creating larger shared data sets.

There are a total of 7 files besides this README.

WHAT FILES ARE INCLUDED
-----------------------

Set A: Files used for analysis in the paper

1. extraneous_preps_crowd_and_expert.csv:  A CSV file containing 923 rows where each row
   contains a potential extranous preposition instance that has been judged by experts 
   and Turkers and each row contains the following fields in order: 

   - "id": a unique instance ID. 
   - "prep": the preposition being judged
   - "sentence": the sentence containing the preposition
   - "preplocation": the location of the preposition in the sentence (word number, starting from 1)
   - "sys1pred": the prediction of system 1 for this instance ('1' represents ERROR, '3' represents OK)
   - "sys2pred": the prediction of system 2 for this instance
   - "internal": the internal or expert judgments for this instance, separated by "|". 
      For example, with three experts this can be the string "3|3|1"
   - "crowd": the external or crowd/Turker judgments for this instance, separated by "|"

2. extraneous_preps-bin1-50-75.csv: A subset of the above file congtaining only instances 
   where the Turker agreement with the majority rating is greater than or equal to 50% 
   (the lowest possible with a binary class) and less than 75%. The fields are the same
   as those for the full CSV file. 

3. extraneous_preps-bin2-75-90.csv

4. extraneous_preps-bin3-90-100.csv

SET B: Additional files provided to the community

5. articles_expert_and_crowd.csv: A csv file containing 156 rows where each row
   contains a potential article error instance that has been judged by experts
   and Turkers and each row contains the following fields in order:

   - "id": a unique instance ID. 
   - "np": the noun phrase containing the article being judged
   - "sentence": the sentence containing the article
   - "internal": the internal or expert judgments for this instance, separated by "|". 
      For example, with three experts this can be the string "3|3|1"  ('1' represents ERROR, '3' represents OK)
   - "crowd": the external or crowd/Turker judgments for this instance, separated by "|"

6. collocations_expert_and_crowd.csv: A csv file containing 149 rows where each row
   contains a potential collocation error instance that has been judged by experts
   and Turkers and each row contains the following fields in order:

   - "id": a unique instance ID. 
   - "collocation": the collocation being judged
   - "sentence": the sentence containing the collocation
   - "internal": the internal or expert judgments for this instance, separated by "|". 
      For example, with three experts this can be the string "3|3|1" ('1' represents ERROR, '3' represents OK)
   - "crowd": the external or crowd/Turker judgments for this instance, separated by "|"

7. confusion_preps_expert_and_crowd.csv: A csv file containing 152 rows where each row
   contains a potential confused preposition error instance that has been judged by experts
   and Turkers and each row contains the following fields in order:

   - "id": a unique instance ID. 
   - "prep": the preposition being judged
   - "sentence": the sentence containing the preposition
   - "preplocation": the location of the preposition in the sentence (word number, starting from 1)
   - "internal": the single expert judgments for this instance  ('1' represents ERROR, '3' represents OK)
   - "crowd": the external or crowd/Turker judgments for this instance, separated by "|"



