Bram van Es

2026

DT4H.nl at #SMM4H-HeaRD 2026: Multilingual Clinical NER with multilingual and monolingual models
Bram van Es
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks

We describe the setup we used to complete the MultiClinAI-NER task in the SMM4H-HeaRD workshop 2026. In this work we employed a dedicated multilingual encoder model (EuroBERT-610m), two Dutch encoder models trained from scratch on clinical corpora (MedRoBERTa.nl and CardioDeBERTa.nl) and a generic Dutch encoder model (RobBERT2023-large), all finetuned with a 3-layer DNN head. We find that the use of multilingual datasets is potentially beneficial in augmenting the training corpora of monolingual models.

2025

pdf bib abs

LAD: LoRA-Adapted Diffusion
Ruurd Jan Anthonius Kuiper | Lars de Groot | Bram van Es | Maarten van Smeden | Ayoub Bagheri
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Autoregressive models dominate text generation but suffer from left-to-right decoding constraints that limit efficiency and bidirectional reasoning. Diffusion-based models offer a flexible alternative but face challenges in adapting to discrete text efficiently. We propose LAD (LoRA-Adapted Diffusion), a framework for non-autoregressive generation that adapts LLaMA models for iterative, bidirectional sequence refinement using LoRA adapters. LAD employs a structural denoising objective combining masking with text perturbations (swaps, duplications and span shifts), enabling full sequence editing during generation. We aim to demonstrate that LAD could be a viable and efficient alternative to training diffusion models from scratch, by providing both validation results as well as two interactive demos directly available online:https://ruurdkuiper.github.io/tini-lad/https://huggingface.co/spaces/Ruurd/tini-ladInference and training code:https://github.com/RuurdKuiper/lad-code

Co-authors

Venues

Fix author