URJC-Team at #SMM4H-HeaRD 2026: TNM Stage Extraction with a Regex-LLM Workflow
Natalia Madrueño, Jose Walter Hernández Pérez, Rubén R. Fernández, Soto Montalvo
Abstract
TNM cancer staging is a critical process for characterizing tumor burden and guiding clinical decisions. Nevertheless, its automated extraction remains challenging due to the unstructured and heterogeneous nature of free-text pathology reports. This paper describes the participation of the URJC-Team in Task 6 of the Social Media Mining for Health/Health Real-World Data (#SMM4H-HeaRD) 2026 Shared Tasks. It focuses on predicting TNM staging from pathology reports. The proposed workflow combines hand-crafted regular expressions with a Large Language Model (LLM). First, explicit TNM mentions are extracted using rule-based patterns. Then, any stage not recovered by these rules is inferred by an LLM. Overall, the proposal provides competitive results across all official shared-task phases.- Anthology ID:
- 2026.smm4h-1.22
- Volume:
- Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, United States
- Editors:
- Guillermo Lopez-Garcia, Graciela Gonzalez-Hernandez
- Venues:
- SMM4H | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 133–138
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.smm4h-1.22/
- DOI:
- Cite (ACL):
- Natalia Madrueño, Jose Walter Hernández Pérez, Rubén R. Fernández, and Soto Montalvo. 2026. URJC-Team at #SMM4H-HeaRD 2026: TNM Stage Extraction with a Regex-LLM Workflow. In Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks, pages 133–138, San Diego, United States. Association for Computational Linguistics.
- Cite (Informal):
- URJC-Team at #SMM4H-HeaRD 2026: TNM Stage Extraction with a Regex-LLM Workflow (Madrueño et al., SMM4H 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.smm4h-1.22.pdf