Foundation Models Meet Embodied Agents

Manling Li, Yunzhu Li, Jiayuan Mao, Wenlong Huang


Abstract
This tutorial will present a systematic overview of recent advances in foundation models for embodied agents, covering three types of foundation models based on input and output: Large Language Models (LLMs), Vision-Language Models (VLMs), Vision-Language-Action Models (VLAs)
Anthology ID:
2025.naacl-tutorial.3
Volume:
Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 5: Tutorial Abstracts)
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Maria Lomeli, Swabha Swayamdipta, Rui Zhang
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15–24
Language:
URL:
https://preview.aclanthology.org/Author-page-Marten-During-lu/2025.naacl-tutorial.3/
DOI:
Bibkey:
Cite (ACL):
Manling Li, Yunzhu Li, Jiayuan Mao, and Wenlong Huang. 2025. Foundation Models Meet Embodied Agents. In Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 5: Tutorial Abstracts), pages 15–24, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Foundation Models Meet Embodied Agents (Li et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/Author-page-Marten-During-lu/2025.naacl-tutorial.3.pdf