When social robots see our sketches: evaluating human perception of a robot and a VLM model performance in a drawing task

Viktoria Paraskevi Daniilidou; Nikolai Ilinykh; Vladislav Maraev

When social robots see our sketches: evaluating human perception of a robot and a VLM model performance in a drawing task

Viktoria Paraskevi Daniilidou, Nikolai Ilinykh, Vladislav Maraev

Abstract

We introduce a multimodal framework for interactive drawing in a robot-assisted second language learning scenario. In this scenario, humans are asked to draw objects and spatial relations between them, while a social robot that uses a vision-language model (VLM) to analyse whether the drawings are correct.The correctness decision that is passed to the human is coming from a Wizard-of-Oz (WoZ) setup. Therefore, we use it to indirectly evaluate the quality of VLM predictions. We show that the task is very challenging for a VLM and approaching evaluation of VLM performance is important: focusing on the correctness of prediction of certain features (objects, relations) provides a different evaluation picture from when the model is evaluated on prediction of the content of the image as a whole. We also examine how the appearance of the social agent and the type of feedback influence perception of the agent by the participants through a questionnaire. The comparison of verbal feedback, generated by the large language models, against simple pattern-based feedback did not show any significant effects whereas the robot’s appearance change indicated significant difference in user ratings concerning naturalness of the agent and its social presence.

Anthology ID:: 2026.iwsds-1.25
Volume:: Proceedings of the 16th International Workshop on Spoken Dialogue System Technology
Month:: February
Year:: 2026
Address:: Trento, Italy
Editors:: Giuseppe Riccardi, Seyed Mahed Mousavi, Maria Ines Torres, Koichiro Yoshino, Zoraida Callejas, Shammur Absar Chowdhury, Yun-Nung Chen, Frederic Bechet, Joakim Gustafson, Géraldine Damnati, Alex Papangelis, Luis Fernando D’Haro, John Mendonça, Raffaella Bernardi, Dilek Hakkani-Tur, Giuseppe "Pino" Di Fabbrizio, Tatsuya Kawahara, Firoj Alam, Gokhan Tur, Michael Johnston
Venue:: IWSDS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 239–252
Language:
URL:: https://preview.aclanthology.org/dashboard-stats/2026.iwsds-1.25/
DOI:
Bibkey:
Cite (ACL):: Viktoria Paraskevi Daniilidou, Nikolai Ilinykh, and Vladislav Maraev. 2026. When social robots see our sketches: evaluating human perception of a robot and a VLM model performance in a drawing task. In Proceedings of the 16th International Workshop on Spoken Dialogue System Technology, pages 239–252, Trento, Italy. Association for Computational Linguistics.
Cite (Informal):: When social robots see our sketches: evaluating human perception of a robot and a VLM model performance in a drawing task (Daniilidou et al., IWSDS 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/dashboard-stats/2026.iwsds-1.25.pdf

PDF Cite Search Fix data