In this HIT you will be presented with a a news article an automatically-generated summary of that article. Your job is to rate the the system generation across 2 axes:

  • Coherence/Fluency: Is the system's generation grammatical, easy-to-read, and well-written?
  • Summary Quality: Does the system's generation meaningfully capture the main points in the article?

You will be able to rate each of the three axes on a scale from 1 to 5, with 1 being the lowest/worst and 5 the highest/best. The specific scales are:

  • Coherence/Fluency:
    • 5/5 (excellent): The summary itself is grammatical, fluent, and reasonable.
    • 4/5 (good): The summary generally makes sense, but there are minor grammatical errors or topical shifts that don't make for the best writing.
    • 3/5 (okay): I can see why this summary was generated, and it's somewhat readable, but there are problems that can't be ignored.
    • 2/5 (poor): Some parts of the summary could make sense, but it's unnatural, illogical, or quite hard to read.
    • 1/5 (terrible): The summary has nothing to do with the article, and/or there are severe errors in grammaticality or fluency.
  • Summary Quality
    • 5/5 (very good): The summary correctly and completely captures the key points of the article.
    • 4/5 (mostly good): The summary captures most of the key points from the article.
    • 3/5 (neutral): While there are no egregious inconsistencies, there are major missing key points, or some partly incorrect summarizations.
    • 2/5 (mostly bad): The summary mentions some things from the article, but there are few reasonable points.
    • 1/5 (awful): The summary has nothing to do with the article or severely contradicts it.

You don't need to read the entire article, but --- please do skim it to get a sense of the main points and how it relates to the summary. In particular, you should focus on two things when evaluating summary quality:
  • Does the summary cover the main points of the story?
  • Are the facts/references made in the summary represented in the article? Or are facts misconstrued or made-up?
In early testing doing this HIT ourselves, we found it useful to not display the sliders until after a time delay, so we've made it so the sliders are hidden for 15 seconds. They should appear automatically.
Article:
${prompt}
System's summary (rate this!):
${machine_completion}

Please take time to read the system's summary and to skim the article briefly --- then, rate the system's summary on the form below (appears in ~15s...). For "Summary quality" --- please reference the article to check: 1) if the summary contains the key points; and 2) if the specific details mentioned in the summary are correct.

Is the system's generation grammatical, easy-to-read, and well-written?

I can see why this summary was generated, and it's somewhat readable, but there are problems that can't be ignored.

Does the system's generation meaningfully capture the main points in the article?

While there are no egregious inconsistencies, there are major missing key points, or some partly incorrect summarizations.

(Optional) Please let us know if anything was unclear, if you experienced any issues, or if you have any other feedback for us.