Abstract
We present Barch, a new English dataset of human-written summaries describing bar charts. This dataset contains 47 charts based on a selection of 18 topics. Each chart is associated with one of the four intended messages expressed in the chart title. Using crowdsourcing, we collected around 20 summaries per chart, or one thousand in total. The text of the summaries is aligned with the chart data as well as with analytical inferences about the data drawn by humans. Our datasets is one of the first to explore the effect of intended messages on the data descriptions in chart summaries. Additionally, it lends itself well to the task of training data-driven systems for chart-to-text generation. We provide results on the performance of state-of-the-art neural generation models trained on this dataset and discuss the strengths and shortcomings of different models.- Anthology ID:
- 2022.lrec-1.380
- Volume:
- Proceedings of the Thirteenth Language Resources and Evaluation Conference
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 3552–3560
- Language:
- URL:
- https://aclanthology.org/2022.lrec-1.380
- DOI:
- Cite (ACL):
- Iza Škrjanec, Muhammad Salman Edhi, and Vera Demberg. 2022. Barch: an English Dataset of Bar Chart Summaries. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3552–3560, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Barch: an English Dataset of Bar Chart Summaries (Škrjanec et al., LREC 2022)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2022.lrec-1.380.pdf
- Data
- AutoChart, Chart2Text