MedCAT v2: a modular, extensible architecture for clinical named entity recognition and linking under real-world privacy and compute constraints

Mart Ratas, Thomas Searle, Adam Sutton, Richard Dobson


Abstract
MedCAT is an open-source framework for clinical named entity recognition and linking (NER+L) widely used in research and healthcare settings. We present MedCAT v2, a re-engineered version designed to improve modularity, extensibility, and maintainability while preserving the core functionality and performance of previous releases. The new architecture introduces a registry-based component system and a flexible pipeline that enables easy substitution of components, integration of alternative methods, and future expansion, including support for pre-trained components across the full NER+L and contextualisation workflow. This enables systematic exploration of clinical NER+L design trade-offs by evaluating different components in the pipeline. Evaluation across multiple public datasets shows equivalent or improved performance compared to earlier versions, with reduced integration overhead and improved runtime flexibility. The framework also supports optional extensions such as meta-annotation, relation extraction, providing a unified and reproducible environment for clinical NLP in real-world settings.
Anthology ID:
2026.bionlp-1.17
Volume:
BioNLP 2026
Month:
July
Year:
2026
Address:
San Diego, California
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Kirk Roberts, Junichi Tsujii
Venues:
BioNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
191–198
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.17/
DOI:
Bibkey:
Cite (ACL):
Mart Ratas, Thomas Searle, Adam Sutton, and Richard Dobson. 2026. MedCAT v2: a modular, extensible architecture for clinical named entity recognition and linking under real-world privacy and compute constraints. In BioNLP 2026, pages 191–198, San Diego, California. Association for Computational Linguistics.
Cite (Informal):
MedCAT v2: a modular, extensible architecture for clinical named entity recognition and linking under real-world privacy and compute constraints (Ratas et al., BioNLP 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.17.pdf