Thank you for your participation in this and other similar
HITS!
Please take a moment to familiarize yourself with this new HIT
by reading the instructions/examples, because things have changed a
bit. Thanks again for your work!
In this HIT you will be presented with a dialogue consisting of a conversation between two
people. You will also be given a system
generation, which aims to contains the next line of the
conversation. Your job is to rate the system
generation, across 2 axes:
- Fluency/Grammaticality: Is the system's
generation grammatical, easy-to-read, and
fluent?
- Quality/Coherence: Is the next utterance
coherent, reasonable, and the type of thing a
person might say, in the context of the dialogue
history?
You will rate each of the two axes on a scale from 1 to 5, with
1 being the lowest/worst and 5 the highest/best. The specific scales are:
-
Fluency/Grammaticality:
- 5/5 (excellent): The generation
is grammatical and fluent.
- 4/5 (good): The sentence largely makes sense, but there are
some small grammar issues/out-of-place words that don't make
for the best writing.
- 3/5 (okay): The grammar is okay and it's possible to read,
but it definitely doesn't sound like a human wrote it.
- 2/5 (poor): Even though I can kind-of tell the meaning,
it's difficult to read this unnatural sentence.
- 1/5 (terrible): The generation
has severe errors in grammaticality/is almost or completely
unreadable.
-
Quality/Coherence:
- 5/5 (perfectly coherent,
interesting): The generated next utterance is very
relevant and coherent with the dialogue context; a human might
say this.
- 4/5 (mostly relevant): The generation is relevant but not
perfect given the dialogue context.
- 3/5 (neutral): The generation is somewhat
plausible/relevant.
- 2/5 (mostly irrelevant): I see why this could be generated
but it doesn't make much sense.
- 1/5
(wrong/nonsense/irrelevant): The generation doesn't
seem to apply to the dialogue at all or doesn't make any
sense.