Abstract
Extraction of spatial relations from sentences with complex/nesting relationships is very challenging as often needs resolving inherent semantic ambiguities. We seek help from visual modality to fill the information gap in the text modality and resolve spatial semantic ambiguities. We use various recent vision and language datasets and techniques to train inter-modality alignment models, visual relationship classifiers and propose a novel global inference model to integrate these components into our structured output prediction model for spatial role and relation extraction. Our global inference model enables us to utilize the visual and geometric relationships between objects and improves the state-of-art results of spatial information extraction from text.- Anthology ID:
- N18-2124
- Volume:
- Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
- Month:
- June
- Year:
- 2018
- Address:
- New Orleans, Louisiana
- Editors:
- Marilyn Walker, Heng Ji, Amanda Stent
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 788–794
- Language:
- URL:
- https://aclanthology.org/N18-2124
- DOI:
- 10.18653/v1/N18-2124
- Cite (ACL):
- Taher Rahgooy, Umar Manzoor, and Parisa Kordjamshidi. 2018. Visually Guided Spatial Relation Extraction from Text. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 788–794, New Orleans, Louisiana. Association for Computational Linguistics.
- Cite (Informal):
- Visually Guided Spatial Relation Extraction from Text (Rahgooy et al., NAACL 2018)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/N18-2124.pdf
- Data
- Visual Genome