zzunlp at ClinicalSkillQA: Perceive-and-Plan with Decomposed In-Context Learning and Saliency-Guided Perception for Clinical Skill Keyframe Reordering
Bin Huang, Yi Luo, Zhontian Hua, Guanghui Zhao, Kaixuan Yuan, Kunli Zhang
Abstract
Multimodal Large Language Models (MLLMs)show strong medical visual understanding,however their capability for continuous per-ception in procedural clinical workflows re-mains underexplored. We present Perceive-and-Plan, a decomposed in-context learningparadigm for clinical skill keyframe reorder-ing. The method separates visual perceptionfrom temporal planning via two stages: (1)structured visual perception with saliency-guided Picture-in-Picture (PiP) compositionthat magnifies critical regions (head, chest)as color-coded insets, and (2) temporal rea-soning with chain-style self-verification viafresh conversation reset and visual-evidenceanchoring (BLS Rules R1-R11). Withoutparameter updates, our system scores 71.43overall (2nd place, ClinSkill QA 2026), with0.86 pairwise accuracy and 1.0 rationale cover-age. Structured prompting with visual saliencyguidance measurably improves MLLMs’ pro-cedural understanding.Our code is pub-lished at https://github.com/NanceTide/clinskillqa-perceive-and-plan.- Anthology ID:
- 2026.bionlp-2.4
- Volume:
- Proceedings of the BioNLP 2026 (Shared Tasks)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, USA
- Editors:
- Deepak Gupta, Dina Demner-Fushman
- Venues:
- BioNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 24–32
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-2.4/
- DOI:
- Cite (ACL):
- Bin Huang, Yi Luo, Zhontian Hua, Guanghui Zhao, Kaixuan Yuan, and Kunli Zhang. 2026. zzunlp at ClinicalSkillQA: Perceive-and-Plan with Decomposed In-Context Learning and Saliency-Guided Perception for Clinical Skill Keyframe Reordering. In Proceedings of the BioNLP 2026 (Shared Tasks), pages 24–32, San Diego, California, USA. Association for Computational Linguistics.
- Cite (Informal):
- zzunlp at ClinicalSkillQA: Perceive-and-Plan with Decomposed In-Context Learning and Saliency-Guided Perception for Clinical Skill Keyframe Reordering (Huang et al., BioNLP 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-2.4.pdf