Claudia-Alexandra Ursu
2026
MetaMiners at SMM4H-HeaRD 2026: A Semantic-Structural Knowledge-Enriched Ensemble for SARS-CoV-2 Metadata Identification
Claudia-Alexandra Ursu | Alecsandru-Florin Soare
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
Claudia-Alexandra Ursu | Alecsandru-Florin Soare
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
This paper presents a hybrid solution for a binary classification of medical PubMed articles created for identifying reports that associate clinical metadata with SARS-CoV-2 genomic sequences. The system is designed to catch the subtle distinction between reports of sequence-associated patient metadata and sentences where such metadata is either unrelated, irellevant, or linked to previous studies. The biggest challenge is the fact that the medical dataset is highly imbalanced, consisting of only 13.3 % of medical reports labeled positive.Our system proposes a hybrid system that combines 4 approaches that includes dual-evidence tagging, negation-aware suppression, semantic frame extraction, adversarial training. All these approaches were tested on multiple models: BiomedBERT-base-abstract, BioLinkBERT-large, PubMedBERT-base-fulltext, followed by a best subset ensamble search to obtain the result of 0.792 F1 score, setting a new benchmark and positioning the solution on the 1st place of the competition.