Systematic Generalization on gSCAN: What is Nearly Solved and What is Next?

Linlu Qiu, Hexiang Hu, Bowen Zhang, Peter Shaw, Fei Sha


Abstract
We analyze the grounded SCAN (gSCAN) benchmark, which was recently proposed to study systematic generalization for grounded language understanding. First, we study which aspects of the original benchmark can be solved by commonly used methods in multi-modal research. We find that a general-purpose Transformer-based model with cross-modal attention achieves strong performance on a majority of the gSCAN splits, surprisingly outperforming more specialized approaches from prior work. Furthermore, our analysis suggests that many of the remaining errors reveal the same fundamental challenge in systematic generalization of linguistic constructs regardless of visual context. Second, inspired by this finding, we propose challenging new tasks for gSCAN by generating data to incorporate relations between objects in the visual environment. Finally, we find that current models are surprisingly data inefficient given the narrow scope of commands in gSCAN, suggesting another challenge for future work.
Anthology ID:
2021.emnlp-main.166
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2180–2188
Language:
URL:
https://aclanthology.org/2021.emnlp-main.166
DOI:
10.18653/v1/2021.emnlp-main.166
Bibkey:
Cite (ACL):
Linlu Qiu, Hexiang Hu, Bowen Zhang, Peter Shaw, and Fei Sha. 2021. Systematic Generalization on gSCAN: What is Nearly Solved and What is Next?. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2180–2188, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Systematic Generalization on gSCAN: What is Nearly Solved and What is Next? (Qiu et al., EMNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.emnlp-main.166.pdf
Video:
 https://preview.aclanthology.org/ingestion-script-update/2021.emnlp-main.166.mp4
Code
 LauraRuis/groundedSCAN +  additional community code
Data
SCAN