Pedro Gabriel Campana
2026
FBK-NLP at ClinSkill QA 2026: Improving Temporal Reasoning via Keypoint-Augmented Inputs
Pedro Gabriel Campana | Alberto Lavelli | Bernardo Magnini
Proceedings of the BioNLP 2026 (Shared Tasks)
Pedro Gabriel Campana | Alberto Lavelli | Bernardo Magnini
Proceedings of the BioNLP 2026 (Shared Tasks)
Understanding procedural skills from visual data is a key challenge in medical AI, especially for tasks that require reasoning over temporal sequences. We report on FBK-NLP’s participation at the ClinSkill QA 2026 shared task, which requires models to arrange shuffled key frames into a coherent sequence of clinical actions and provide explanations for the resulting order. We conduct a systematic study of prompting and reasoning strategies using an open and easily deployable vision-language model (VLM). The central finding of our study is that incorporating keypoint-based representations of people’s body parts substantially improves temporal reasoning behind frame ordering. Furthermore, we show that model performance is highly sensitive to prompt design and to seemingly minor factors such as filename ordering and the inclusion of domain information.