Decoding Language Spatial Relations to 2D Spatial Arrangements
Gorjan Radevski, Guillem Collell, Marie-Francine Moens, Tinne Tuytelaars
Abstract
We address the problem of multimodal spatial understanding by decoding a set of language-expressed spatial relations to a set of 2D spatial arrangements in a multi-object and multi-relationship setting. We frame the task as arranging a scene of clip-arts given a textual description. We propose a simple and effective model architecture Spatial-Reasoning Bert (SR-Bert), trained to decode text to 2D spatial arrangements in a non-autoregressive manner. SR-Bert can decode both explicit and implicit language to 2D spatial arrangements, generalizes to out-of-sample data to a reasonable extent and can generate complete abstract scenes if paired with a clip-arts predictor. Finally, we qualitatively evaluate our method with a user study, validating that our generated spatial arrangements align with human expectation.- Anthology ID:
- 2020.findings-emnlp.408
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2020
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Trevor Cohn, Yulan He, Yang Liu
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4549–4560
- Language:
- URL:
- https://aclanthology.org/2020.findings-emnlp.408
- DOI:
- 10.18653/v1/2020.findings-emnlp.408
- Cite (ACL):
- Gorjan Radevski, Guillem Collell, Marie-Francine Moens, and Tinne Tuytelaars. 2020. Decoding Language Spatial Relations to 2D Spatial Arrangements. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4549–4560, Online. Association for Computational Linguistics.
- Cite (Informal):
- Decoding Language Spatial Relations to 2D Spatial Arrangements (Radevski et al., Findings 2020)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2020.findings-emnlp.408.pdf
- Code
- gorjanradevski/sr-bert