In this HIT you will be presented with a a
news article an automatically-generated summary of that article.
Your job is to rate the the system
generation across 2 axes:
- Coherence/Fluency: Is the system's
generation grammatical, easy-to-read, and
well-written?
- Summary Quality: Does the system's
generation meaningfully capture the main points in
the article?
You will be able to rate each of the three axes on a scale from 1
to 5, with 1 being the lowest/worst and
5 the highest/best. The specific scales
are:
-
Coherence/Fluency:
- 5/5 (excellent): The summary
itself is grammatical, fluent, and reasonable.
- 4/5 (good): The summary generally makes sense, but there
are minor grammatical errors or topical shifts that don't make
for the best writing.
- 3/5 (okay): I can see why this summary was generated, and
it's somewhat readable, but there are problems that can't be
ignored.
- 2/5 (poor): Some parts of the summary could make sense, but
it's unnatural, illogical, or quite hard to read.
- 1/5 (terrible): The summary has
nothing to do with the article, and/or there are severe errors
in grammaticality or fluency.
-
Summary Quality
- 5/5 (very good): The summary
correctly and completely captures the key points of the
article.
- 4/5 (mostly good): The summary captures most of the key
points from the article.
- 3/5 (neutral): While there are no egregious
inconsistencies, there are major missing key points, or some
partly incorrect summarizations.
- 2/5 (mostly bad): The summary mentions some things from the
article, but there are few reasonable points.
- 1/5 (awful): The summary has
nothing to do with the article or severely contradicts it.
You don't need to read the entire article, but --- please do
skim it to get a sense of the main points and how it relates to the
summary. In particular, you should focus on two things when
evaluating summary quality:
- Does the summary cover the main points of the
story?
- Are the facts/references made in the summary represented in
the article? Or are facts misconstrued or made-up?
In early testing doing this HIT ourselves, we found it useful to
not display the sliders until after a time delay, so we've made it so
the sliders are hidden for 15 seconds. They should appear
automatically.