When social robots see our sketches: evaluating human perception of a robot and a VLM model performance in a drawing task
Viktoria Paraskevi Daniilidou, Nikolai Ilinykh, Vladislav Maraev
Abstract
We introduce a multimodal framework for interactive drawing in a robot-assisted second language learning scenario. In this scenario, humans are asked to draw objects and spatial relations between them, while a social robot that uses a vision-language model (VLM) to analyse whether the drawings are correct.The correctness decision that is passed to the human is coming from a Wizard-of-Oz (WoZ) setup. Therefore, we use it to indirectly evaluate the quality of VLM predictions. We show that the task is very challenging for a VLM and approaching evaluation of VLM performance is important: focusing on the correctness of prediction of certain features (objects, relations) provides a different evaluation picture from when the model is evaluated on prediction of the content of the image as a whole. We also examine how the appearance of the social agent and the type of feedback influence perception of the agent by the participants through a questionnaire. The comparison of verbal feedback, generated by the large language models, against simple pattern-based feedback did not show any significant effects whereas the robot’s appearance change indicated significant difference in user ratings concerning naturalness of the agent and its social presence.- Anthology ID:
- 2026.iwsds-1.25
- Volume:
- Proceedings of the 16th International Workshop on Spoken Dialogue System Technology
- Month:
- February
- Year:
- 2026
- Address:
- Trento, Italy
- Editors:
- Giuseppe Riccardi, Seyed Mahed Mousavi, Maria Ines Torres, Koichiro Yoshino, Zoraida Callejas, Shammur Absar Chowdhury, Yun-Nung Chen, Frederic Bechet, Joakim Gustafson, Géraldine Damnati, Alex Papangelis, Luis Fernando D’Haro, John Mendonça, Raffaella Bernardi, Dilek Hakkani-Tur, Giuseppe "Pino" Di Fabbrizio, Tatsuya Kawahara, Firoj Alam, Gokhan Tur, Michael Johnston
- Venue:
- IWSDS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 239–252
- Language:
- URL:
- https://preview.aclanthology.org/dashboard-stats/2026.iwsds-1.25/
- DOI:
- Cite (ACL):
- Viktoria Paraskevi Daniilidou, Nikolai Ilinykh, and Vladislav Maraev. 2026. When social robots see our sketches: evaluating human perception of a robot and a VLM model performance in a drawing task. In Proceedings of the 16th International Workshop on Spoken Dialogue System Technology, pages 239–252, Trento, Italy. Association for Computational Linguistics.
- Cite (Informal):
- When social robots see our sketches: evaluating human perception of a robot and a VLM model performance in a drawing task (Daniilidou et al., IWSDS 2026)
- PDF:
- https://preview.aclanthology.org/dashboard-stats/2026.iwsds-1.25.pdf