VerbaNexAI at ClinicalSkillQA: From Visual Evidence to Procedural Order A Two-Stage Generative Vision-Language Framework for ClinSkillQA
Andrea Menco Tovar, Jairo E. Serrano, Edwin Puertas, Juan Carlos Martinez-Santos
Abstract
This work addresses the temporal ordering task of clinical frames in the Basic Life Support (BLS) subset of ClinSkillQA. A two-stage hybrid pipeline based on Qwen2-VL-2B-Instruct in a zero-shot configuration is proposed. In Stage 1, each image is processed independently to extract factual visual evidence, which is then transformed, using deterministic rules, into a structured representation. In Stage 2, ordering is formulated as an ordinal scoring task over procedural stages, with ties broken using PCA applied to multimodal embeddings. Evaluation followed the official benchmark protocol, considering Task Accuracy, Pairwise Accuracy, and BERTScore. In the test phase, the system achieved Task Accuracy = 0.17, Pairwise Micro Accuracy = 0.60, and BERT F1 = 0.71, with complete coverage in both predictions and rationales. The results demonstrate an interpretable and reproducible foundation, although challenges in fine-grained temporal discrimination remain.- Anthology ID:
- 2026.bionlp-2.2
- Volume:
- Proceedings of the BioNLP 2026 (Shared Tasks)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, USA
- Editors:
- Deepak Gupta, Dina Demner-Fushman
- Venues:
- BioNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6–12
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-2.2/
- DOI:
- Cite (ACL):
- Andrea Menco Tovar, Jairo E. Serrano, Edwin Puertas, and Juan Carlos Martinez-Santos. 2026. VerbaNexAI at ClinicalSkillQA: From Visual Evidence to Procedural Order A Two-Stage Generative Vision-Language Framework for ClinSkillQA. In Proceedings of the BioNLP 2026 (Shared Tasks), pages 6–12, San Diego, California, USA. Association for Computational Linguistics.
- Cite (Informal):
- VerbaNexAI at ClinicalSkillQA: From Visual Evidence to Procedural Order A Two-Stage Generative Vision-Language Framework for ClinSkillQA (Menco Tovar et al., BioNLP 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-2.2.pdf