SubmissionNumber#=%=#41 FinalPaperTitle#=%=#Structured Temporal Grounding for Multimodal Medical Video Question Answering ShortPaperTitle#=%=# NumberOfPages#=%=# CopyrightSigned#=%=# JobTitle#==# Organization#==# Abstract#==#MedGenVidQA 2026 Task C evaluates temporal grounding for medical video question answering. The system receives a video and a question, then returns the start and end time of the visual answer. The UNCC submission treats this problem as evidence-guided interval selection rather than free-form timestamp generation. The code builds timestamped transcript tables, procedure phase maps, redundant candidate spans, schema-controlled ranking calls, boundary checks, and deterministic submission validation. Each predicted interval is therefore tied to stored transcript rows and to a limited candidate group. The official run ranked fifth among six participant systems, with 62.50 IoU@0.3, 36.25 IoU@0.5, 22.50 IoU@0.7, and 42.57 mIoU. The result indicates that transcript and phase evidence can recover the right procedural neighborhood, while strict localization still depends on finer ASR timing, denser visual checks, and learned duration calibration. Author{1}{Firstname}#=%=#Hilmi Author{1}{Lastname}#=%=#Demirhan Author{1}{Username}#=%=#demirhan1 Author{1}{Orcid}#=%=# Author{1}{Email}#=%=#hilmiunc@gmail.com Author{1}{Affiliation}#=%=#University of North Carolina Wilmington Author{2}{Firstname}#=%=#Wlodek Author{2}{Lastname}#=%=#Zadrozny Author{2}{Orcid}#=%=# Author{2}{Email}#=%=#wzadrozn@charlotte.edu Author{2}{Affiliation}#=%=#University of North Carolina Charlotte ========== èéáğö