SubmissionNumber#=%=#41
FinalPaperTitle#=%=#Structured Temporal Grounding for Multimodal Medical Video Question Answering
ShortPaperTitle#=%=#
NumberOfPages#=%=#
CopyrightSigned#=%=#
JobTitle#==#
Organization#==#
Abstract#==#MedGenVidQA 2026 Task C evaluates temporal grounding for medical video question answering. The system receives a video and a question, then returns the start and end time of the visual answer. The UNCC submission treats this problem as evidence-guided interval selection rather than free-form timestamp generation. The code builds timestamped transcript tables, procedure phase maps, redundant candidate spans, schema-controlled ranking calls, boundary checks, and deterministic submission validation. Each predicted interval is therefore tied to stored transcript rows and to a limited candidate group. The official run ranked fifth among six participant systems, with 62.50 IoU@0.3, 36.25 IoU@0.5, 22.50 IoU@0.7, and 42.57 mIoU. The result indicates that transcript and phase evidence can recover the right procedural neighborhood, while strict localization still depends on finer ASR timing, denser visual checks, and learned duration calibration.
Author{1}{Firstname}#=%=#Hilmi
Author{1}{Lastname}#=%=#Demirhan
Author{1}{Username}#=%=#demirhan1
Author{1}{Orcid}#=%=#
Author{1}{Email}#=%=#hilmiunc@gmail.com
Author{1}{Affiliation}#=%=#University of North Carolina Wilmington
Author{2}{Firstname}#=%=#Wlodek
Author{2}{Lastname}#=%=#Zadrozny
Author{2}{Orcid}#=%=#
Author{2}{Email}#=%=#wzadrozn@charlotte.edu
Author{2}{Affiliation}#=%=#University of North Carolina Charlotte

==========
èéáğö