FBK-NLP at ClinSkill QA 2026: Improving Temporal Reasoning via Keypoint-Augmented Inputs

Pedro Gabriel Campana, Alberto Lavelli, Bernardo Magnini


Abstract
Understanding procedural skills from visual data is a key challenge in medical AI, especially for tasks that require reasoning over temporal sequences. We report on FBK-NLP’s participation at the ClinSkill QA 2026 shared task, which requires models to arrange shuffled key frames into a coherent sequence of clinical actions and provide explanations for the resulting order. We conduct a systematic study of prompting and reasoning strategies using an open and easily deployable vision-language model (VLM). The central finding of our study is that incorporating keypoint-based representations of people’s body parts substantially improves temporal reasoning behind frame ordering. Furthermore, we show that model performance is highly sensitive to prompt design and to seemingly minor factors such as filename ordering and the inclusion of domain information.
Anthology ID:
2026.bionlp-2.14
Volume:
Proceedings of the BioNLP 2026 (Shared Tasks)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Deepak Gupta, Dina Demner-Fushman
Venues:
BioNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
92–98
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-2.14/
DOI:
Bibkey:
Cite (ACL):
Pedro Gabriel Campana, Alberto Lavelli, and Bernardo Magnini. 2026. FBK-NLP at ClinSkill QA 2026: Improving Temporal Reasoning via Keypoint-Augmented Inputs. In Proceedings of the BioNLP 2026 (Shared Tasks), pages 92–98, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
FBK-NLP at ClinSkill QA 2026: Improving Temporal Reasoning via Keypoint-Augmented Inputs (Campana et al., BioNLP 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-2.14.pdf
Supplementarymaterial:
 2026.bionlp-2.14.SupplementaryMaterial.zip