Andreas Voss
2026
Dr-BERT-NL at #SMM4H–HeaRD 2026: DOKTERBERT – Ontology-Grounded Contextual Representations for Dutch Clinical NLP
Gijs Danoe | Andreas Voss | Axel Hamprecht | Matthijs S. Berends
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
Gijs Danoe | Andreas Voss | Axel Hamprecht | Matthijs S. Berends
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
We describe our submission to SMM4H-HeaRD 2026 Task 7, which asks systems tolabel ClinicalImpacts and SocialImpactsspans in Reddit posts about non-medical sub-stance use. We compare four pipeline shapesbuilt on the same DeBERTa-v3-base back-bone: (i) a direct 5-class encoder with a linear-chain CRF head, (ii) a two-stage detect-then-classify pipeline that delegates span typingto an instruction-tuned LLM (Qwen2.5-7Bor Gemma-3-12B, 4-bit NF4), (iii) an auditpipeline in which the same LLM verifies theencoder’s predictions, and (iv) a classical-MLvariant that replaces the LLM with an SVMtrained on encoder span embeddings. Across16 configurations, the encoder-only DeBERTa-v3 + CRF configuration is the strongest sin-gle system on the official test split, reaching45.4% strict and 54.2% relaxed F1 — +8.6/ +5.3 points above a mental-roberta-basebaseline. LLM audits give a small dev gain thatdoes not transfer to test.