Thank you for your participation in this and other similar HITS!

Please take a moment to familiarize yourself with this new HIT by reading the instructions/examples, because things have changed a bit. Thanks again for your work!

In this HIT you will be presented with a table from Wikipedia, with a few cells highlighted in yellow. Along with the table, you will be provided some metadata like the title of the Wikipedia article/section. Based on this table, you will also be given a system generation, which aims to capture/summarize/describe the Your job is to rate the generation across 2 axes:

  • Fluency/Grammaticality: Is the system's generation grammatical, easy-to-read, and fluent?
  • Correctness/Specificity: Does the generation correctly describe a fact from the table, and does that fact come from the highlighted cells?

You will be able to rate each of the three axes on a scale from 1 to 5, with 1 being the lowest/worst and 5 the highest/best. The specific scales are:

  • Fluency/Grammaticality:
    • 5/5 (excellent): The generation is grammatical and fluent.
    • 4/5 (good): The sentence largely makes sense, but there are some small grammar issues/out-of-place words that don't make for the best writing.
    • 3/5 (okay): The grammar is okay and it's possible to read, but it definitely doesn't sound like a human wrote it.
    • 2/5 (poor): Even though I can kind-of tell the meaning, it's difficult to read this unnatural sentence.
    • 1/5 (terrible): The generation has severe errors in grammaticality/is almost or completely unreadable.
  • Correctness/Specificity:
    • 5/5 (correct and based on the highlighted cells): The generation correctly describes the information conveyed in the highlighted cells.
    • 4/5 (mostly reasonable): The generation mostly describes the information in the highlighted cells with only small deviations.
    • 3/5 (neutral): The generation is somewhat plausible/relevant, but it's not as specific to the highlighted cells or correct as it could be.
    • 2/5 (mostly unreasonable): I see why this could be generated given the table/cells, but it doesn't make much sense.
    • 1/5 (wrong/nonsense/irrelevant): The generation doesn't seem to apply to the cells/tables at all, or doesn't make any sense.

Notes:
  • For Fluency/Grammaticality, don't worry about correctness! There can be grammatical sentences that do not describe the associated table, and vice versa (see the examples).
  • For Correctness/Specificity, consider both the correctness of the statement given the table, and also its specificity to the highlighted cells: don't give 5/5 if the generation applies better to unhighlighted cells.
  • For Correctness/Specificity, it's okay if the generation references the metadata like the title --- points should be deducted for "specificity" if there are other table cells that are referenced more directly.
  • A handful of tables are quite large! Still apply the same rating criteria, even if the tables have many rows.

Example 1:

Table from Wikipedia:

2012 Chicago Bears season


Section Title: Regular season
Table Section Text: Stats updated to the end of the season (Week 17).
Category Player(s) Value NFL Rank NFC Rank
Passing Yards Jay Cutler 3,033 yards 24th 12th
Passing Touchdowns Jay Cutler 19 TDs 21st 11th
Rushing Yards Matt Forte 1,094 yards T-12th 6th
Rushing Touchdowns Michael Bush/Matt Forte 5 TDs T-20th T-11th
Receptions Brandon Marshall 118 rec § T-2nd 2nd
Receiving Yards Brandon Marshall 1,508 yards § 3rd 2nd
Receiving Touchdowns Brandon Marshall 11 TDs T-4th 3rd
Points Robbie Gould 96 points 25th 13th
Kickoff Return Yards Devin Hester 295 yards 22nd 10th
Punt Return Yards Devin Hester 621 yards 32nd 6th
Tackles (combined) Lance Briggs 40 tackles 40th T-22nd
Sacks Julius Peppers 11.5 sacks T-9th T-5th
Interceptions Tim Jennings 9 INTs 1st 1st
System's generation (rate this!):
In 2012, Chicago Bears wide reciever Brandon Marshall had 1,508 receiving yards.
 
  • Fluency/Grammaticality: 5/5 Why? The sentence is grammatically correct and easy to read.
  • Correctness/Specificity:: 5/5 Why? The highlighted cells are completely and correctly captured by the generation.
 

Example 2:

Table from Wikipedia:

Gerard Piqué


Section Title: International goals
Table Section Text: None
No. Date Venue Cap Opponent Score Result Competition
1 28 March 2009 Santiago Bernabéu Stadium, Madrid, Spain 2 Turkey 1–0 1–0 2010 FIFA World Cup qualification
2 12 August 2009 Philip II Arena, Skopje, Macedonia 8 North Macedonia 2–2 3–2 Friendly
3 5 September 2009 Estadio Riazor, A Coruña, Spain 9 Belgium 3–0 5–0 2010 FIFA World Cup qualification
4 14 October 2009 Bilino Polje, Zenica, Bosnia and Herzegovina 12 Bosnia and Herzegovina 1–0 5–2
5 13 June 2016 Stadium Municipal, Toulouse, France 78 Czech Republic 1–0 1–0 UEFA Euro 2016
System's generation (rate this!):
Piqué scored an international goal at Stadium Municipal.
  • Fluency/Grammaticality: 4/5 Why? The sentence is grammatically correct, though the term "international goal" is a bit awkward --- "a goal on the international stage" would have been better.
  • Correctness/Specificity: 3/5 Why? It might be correct, but the focus of the sentence is the stadium where the player has scored, but that cell is not highlighted.
 

Example 3:

Table from Wikipedia:

1899 San Diego mayoral election


Section Title: Election results
Table Section Text: None
Party Candidate Votes %
  Democratic Edwin M. Capps 1,714 52.3
  Republican Daniel C. Reed 1,493 45.6
  Socialist Labor John Helphingstine 70 2.1
Total votes 3,277 100
System's generation (rate this!):
Daniel C. Reed is a candidate 45.6 percent of the 1899 mayoral election results in San Diego, California. 45.6 --- Daniel C. Reed 1899 mayoral election result. cell
  • Fluency/Grammaticality: 2/5 Why? It's possible to fill-in-the-gaps to make sense of things, but this is not a fluent sentence.
  • Correctness/Specificity: 3/5 Why? The sentence does seem to correctly copy the highlighted cells --- but it's not a correct description of the cells, it's just mindlessly copying.
Table from Wikipedia:
${prompt}
System's generation (rate this!):
${machine_completion}

Is the system's generation grammatical, easy-to-read, and fluent?

The grammar is okay and it's possible to read, but it definitely doesn't sound like a human wrote it.

Does the generation correctly describe a fact from the table, and does that fact come from the highlighted cells?

The generation is somewhat plausible/relevant, but it's not as specific to the highlighted cells or correct as it could be.

(Optional) Please let us know if anything was unclear, if you experienced any issues, or if you have any other feedback for us.