Abstract
We discuss problems with the standard approaches to evaluation for tasks like visual question answering, and argue that artificial data can be used to address these as a complement to current practice. We demonstrate that with the help of existing ‘deep’ linguistic processing technology we are able to create challenging abstract datasets, which enable us to investigate the language understanding abilities of multimodal deep learning models in detail, as compared to a single performance value on a static and monolithic dataset.- Anthology ID:
- W18-1003
- Volume:
- Proceedings of the Workshop on Generalization in the Age of Deep Learning
- Month:
- June
- Year:
- 2018
- Address:
- New Orleans, Louisiana
- Editors:
- Yonatan Bisk, Omer Levy, Mark Yatskar
- Venue:
- Gen-Deep
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 17–23
- Language:
- URL:
- https://aclanthology.org/W18-1003
- DOI:
- 10.18653/v1/W18-1003
- Cite (ACL):
- Alexander Kuhnle and Ann Copestake. 2018. Deep learning evaluation using deep linguistic processing. In Proceedings of the Workshop on Generalization in the Age of Deep Learning, pages 17–23, New Orleans, Louisiana. Association for Computational Linguistics.
- Cite (Informal):
- Deep learning evaluation using deep linguistic processing (Kuhnle & Copestake, Gen-Deep 2018)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/W18-1003.pdf
- Data
- CLEVR, MS COCO, NLVR, SHAPES, ShapeWorld, Visual Question Answering, Visual Question Answering v2.0