These three files contain data associated with:

"Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection"
Youxuan Jiang, Jonathan K. Kummerfeld and Walter S. Lasecki
ACL 2017

The files are:

prompt-sentences.csv - The initial sentences, 20 from each domain. Each line is
an ID, followed by a sentence.

answers.csv - For the 20 advising questions, this provides the sentence ID,
sentence, and answer.

paraphrases-all.csv - Paraphrases collected across all conditions. The fields
are:

  ID            - A unique number for each sentence.
  originalID    - The ID of the original prompt sentence, refering to the
                  prompt-sentences.csv file.
  promptID      - The ID of the sentence shown as a prompt (either from the
                  other file, or from this file in the 'chain' condition).
  condition     - One of the following labels:
                  baseline, bonus-none, bonus-novelty, dialogue,
                  examples-lexical, examples-mixed, examples-none, geoquery,
                  one-paraphrase, chain, ubuntu, wsj
  workerID      - The ID of the crowd worker who wrote the paraphrase.
  correct       - The adjudicated correctness score (0 or 1).
  grammatical   - The adjudicated grammaticality score (0 or 1).
  pinc          - the PINC score for this sentence compared to the original
                  sentence (in the case of the chain condition that's the
                  original sentence, not the prompt).
  time          - The time taken to write the paraphrase, in seconds. A value of
                  0 is given for the first paraphrases a worker wrote (when they
                  also had to read the instructions).
  sentence      - The paraphrase.

