Ebenezer Awotoro


2026

Systematic reviews are fundamental to evidence-based medicine, but the clinical evidence they contain is primarily expressed in unstructured text, making large-scale extraction and reuse difficult. Existing biomedical NLP methods have achieved strong performance on span-level extraction from clinical trials and abstracts; however, these approaches are insufficient for systematic reviews, where evidence is often distributed across multiple studies, sentences, and sections and must be aggregated into normalized document-level attributes. We introduce VaxScope, a benchmark dataset for document-level structured evidence extraction from immunization-related systematic reviews. VaxScope is constructed through an expert-guided semi-automatic annotation pipeline that combines automatic candidate generation with domain expert validation to ensure consistency and annotation quality. We formalize the task as document-level structured extraction, where target labels are defined at the review level and require aggregating evidence beyond isolated textual spans. We further establish baselines for document-level structured extraction using abstract-level input representations and evaluate how access to evidence-grounded contextual input improves performance over abstract-only settings. Baseline experiments show that PubMedBERT achieves the best overall performance (Avg F1: 0.850), with evidence-grounded input improving performance particularly for fields requiring distributed contextual reasoning.