Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from Examples

Philipp Sadler; David Schlangen

Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from Examples

Abstract

NLP tasks are typically defined extensionally through datasets containing example instantiations (e.g., pairs of image _i_ and text _t_), but motivated intensionally through capabilities invoked in verbal descriptions of the task (e.g., “_t_ is a description of _i_, for which the content of _i_ needs to be recognised and understood”).We present Pento-DIARef, a diagnostic dataset in a visual domain of puzzle pieces where referring expressions are generated by a well-known symbolic algorithm (the “Incremental Algorithm”),which itself is motivated by appeal to a hypothesised capability (eliminating distractors through application of Gricean maxims). Our question then is whether the extensional description (the dataset) is sufficient for a neural model to pick up the underlying regularity and exhibit this capability given the simple task definition of producing expressions from visual inputs. We find that a model supported by a vision detection step and a targeted data generation scheme achieves an almost perfect BLEU@1 score and sentence accuracy, whereas simpler baselines do not.

Anthology ID:: 2023.eacl-main.154
Volume:: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Month:: May
Year:: 2023
Address:: Dubrovnik, Croatia
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2106–2122
Language:
URL:: https://aclanthology.org/2023.eacl-main.154
DOI:
Bibkey:
Cite (ACL):: Philipp Sadler and David Schlangen. 2023. Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from Examples. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2106–2122, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):: Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from Examples (Sadler & Schlangen, EACL 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/starsem-semeval-split/2023.eacl-main.154.pdf

PDF Search