Multimodal Embodied Plan Prediction Augmented with Synthetic Embodied Dialogue
Aishwarya Padmakumar, Mert Inan, Spandana Gella, Patrick Lange, Dilek Hakkani-Tur
Abstract
Embodied task completion is a challenge where an agent in a simulated environment must predict environment actions to complete tasks based on natural language instructions and ego-centric visual observations. We propose a variant of this problem where the agent predicts actions at a higher level of abstraction called a plan, which helps make agent actions more interpretable and can be obtained from the appropriate prompting of large language models. We show that multimodal transformer models can outperform language-only models for this problem but fall significantly short of oracle plans. Since collecting human-human dialogues for embodied environments is expensive and time-consuming, we propose a method to synthetically generate such dialogues, which we then use as training data for plan prediction. We demonstrate that multimodal transformer models can attain strong zero-shot performance from our synthetic data, outperforming language-only models trained on human-human data.- Anthology ID:
- 2023.emnlp-main.374
- Volume:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6114–6131
- Language:
- URL:
- https://aclanthology.org/2023.emnlp-main.374
- DOI:
- 10.18653/v1/2023.emnlp-main.374
- Cite (ACL):
- Aishwarya Padmakumar, Mert Inan, Spandana Gella, Patrick Lange, and Dilek Hakkani-Tur. 2023. Multimodal Embodied Plan Prediction Augmented with Synthetic Embodied Dialogue. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6114–6131, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Multimodal Embodied Plan Prediction Augmented with Synthetic Embodied Dialogue (Padmakumar et al., EMNLP 2023)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2023.emnlp-main.374.pdf