Abstract
We present VOILA: an optimised, multi-modal dialogue agent for interactive learning of visually grounded word meanings from a human user. VOILA is: (1) able to learn new visual categories interactively from users from scratch; (2) trained on real human-human dialogues in the same domain, and so is able to conduct natural spontaneous dialogue; (3) optimised to find the most effective trade-off between the accuracy of the visual categories it learns and the cost it incurs to users. VOILA is deployed on Furhat, a human-like, multi-modal robot head with back-projection of the face, and a graphical virtual character.- Anthology ID:
- W17-5524
- Volume:
- Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue
- Month:
- August
- Year:
- 2017
- Address:
- Saarbrücken, Germany
- Editors:
- Kristiina Jokinen, Manfred Stede, David DeVault, Annie Louis
- Venue:
- SIGDIAL
- SIG:
- SIGDIAL
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 197–200
- Language:
- URL:
- https://aclanthology.org/W17-5524
- DOI:
- 10.18653/v1/W17-5524
- Cite (ACL):
- Yanchao Yu, Arash Eshghi, and Oliver Lemon. 2017. VOILA: An Optimised Dialogue System for Interactively Learning Visually-Grounded Word Meanings (Demonstration System). In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 197–200, Saarbrücken, Germany. Association for Computational Linguistics.
- Cite (Informal):
- VOILA: An Optimised Dialogue System for Interactively Learning Visually-Grounded Word Meanings (Demonstration System) (Yu et al., SIGDIAL 2017)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/W17-5524.pdf