Visually Grounded Continual Learning of Compositional Phrases

Xisen Jin; Junyi Du; Arka Sadhu; Ram Nevatia; Xiang Ren

doi:10.18653/v1/2020.emnlp-main.158

Visually Grounded Continual Learning of Compositional Phrases

Xisen Jin, Junyi Du, Arka Sadhu, Ram Nevatia, Xiang Ren

Abstract

Humans acquire language continually with much more limited access to data samples at a time, as compared to contemporary NLP systems. To study this human-like language acquisition ability, we present VisCOLL, a visually grounded language learning task, which simulates the continual acquisition of compositional phrases from streaming visual scenes. In the task, models are trained on a paired image-caption stream which has shifting object distribution; while being constantly evaluated by a visually-grounded masked language prediction task on held-out test sets. VisCOLL compounds the challenges of continual learning (i.e., learning from continuously shifting data distribution) and compositional generalization (i.e., generalizing to novel compositions). To facilitate research on VisCOLL, we construct two datasets, COCO-shift and Flickr-shift, and benchmark them using different continual learning methods. Results reveal that SoTA continual learning approaches provide little to no improvements on VisCOLL, since storing examples of all possible compositions is infeasible. We conduct further ablations and analysis to guide future work.

Anthology ID:: 2020.emnlp-main.158
Volume:: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:: November
Year:: 2020
Address:: Online
Editors:: Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2018–2029
Language:
URL:: https://aclanthology.org/2020.emnlp-main.158
DOI:: 10.18653/v1/2020.emnlp-main.158
Bibkey:
Cite (ACL):: Xisen Jin, Junyi Du, Arka Sadhu, Ram Nevatia, and Xiang Ren. 2020. Visually Grounded Continual Learning of Compositional Phrases. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2018–2029, Online. Association for Computational Linguistics.
Cite (Informal):: Visually Grounded Continual Learning of Compositional Phrases (Jin et al., EMNLP 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/naacl24-info/2020.emnlp-main.158.pdf
Video:: https://slideslive.com/38938894
Code: INK-USC/VG-CCL + additional community code
Data: Flickr30k, MS COCO

PDF Search Code Video