URJC-Team at #SMM4H-HeaRD 2026: TNM Stage Extraction with a Regex-LLM Workflow

Natalia Madrueño, Jose Walter Hernández Pérez, Rubén R. Fernández, Soto Montalvo


Abstract
TNM cancer staging is a critical process for characterizing tumor burden and guiding clinical decisions. Nevertheless, its automated extraction remains challenging due to the unstructured and heterogeneous nature of free-text pathology reports. This paper describes the participation of the URJC-Team in Task 6 of the Social Media Mining for Health/Health Real-World Data (#SMM4H-HeaRD) 2026 Shared Tasks. It focuses on predicting TNM staging from pathology reports. The proposed workflow combines hand-crafted regular expressions with a Large Language Model (LLM). First, explicit TNM mentions are extracted using rule-based patterns. Then, any stage not recovered by these rules is inferred by an LLM. Overall, the proposal provides competitive results across all official shared-task phases.
Anthology ID:
2026.smm4h-1.22
Volume:
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
Month:
July
Year:
2026
Address:
San Diego, United States
Editors:
Guillermo Lopez-Garcia, Graciela Gonzalez-Hernandez
Venues:
SMM4H | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
133–138
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.smm4h-1.22/
DOI:
Bibkey:
Cite (ACL):
Natalia Madrueño, Jose Walter Hernández Pérez, Rubén R. Fernández, and Soto Montalvo. 2026. URJC-Team at #SMM4H-HeaRD 2026: TNM Stage Extraction with a Regex-LLM Workflow. In Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks, pages 133–138, San Diego, United States. Association for Computational Linguistics.
Cite (Informal):
URJC-Team at #SMM4H-HeaRD 2026: TNM Stage Extraction with a Regex-LLM Workflow (Madrueño et al., SMM4H 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.smm4h-1.22.pdf