Abstract
Communication between human and mobile agents is getting increasingly important as such agents are widely deployed in our daily lives. Vision-and-Dialogue Navigation is one of the tasks that evaluate the agent’s ability to interact with humans for assistance and navigate based on natural language responses. In this paper, we explore the Navigation from Dialogue History (NDH) task, which is based on the Cooperative Vision-and-Dialogue Navigation (CVDN) dataset, and present a state-of-the-art model which is built upon Vision-Language transformers. However, despite achieving competitive performance, we find that the agent in the NDH task is not evaluated appropriately by the primary metric – Goal Progress. By analyzing the performance mismatch between Goal Progress and other metrics (e.g., normalized Dynamic Time Warping) from our state-of-the-art model, we show that NDH’s sub-path based task setup (i.e., navigating partial trajectory based on its correspondent subset of the full dialogue) does not provide the agent with enough supervision signal towards the goal region. Therefore, we propose a new task setup called NDH-Full which takes the full dialogue and the whole navigation path as one instance. We present a strong baseline model and show initial results on this new task. We further describe several approaches that we try, in order to improve the model performance (based on curriculum learning, pre-training, and data-augmentation), suggesting potential useful training methods on this new NDH-Full task.- Anthology ID:
- 2021.emnlp-main.518
- Volume:
- Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6432–6442
- Language:
- URL:
- https://aclanthology.org/2021.emnlp-main.518
- DOI:
- 10.18653/v1/2021.emnlp-main.518
- Cite (ACL):
- Hyounghun Kim, Jialu Li, and Mohit Bansal. 2021. NDH-Full: Learning and Evaluating Navigational Agents on Full-Length Dialogue. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6432–6442, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- NDH-Full: Learning and Evaluating Navigational Agents on Full-Length Dialogue (Kim et al., EMNLP 2021)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/2021.emnlp-main.518.pdf
- Code
- hyounghk/ndh-full