Soto Montalvo
2026
URJC-Team at #SMM4H-HeaRD 2026: TNM Stage Extraction with a Regex-LLM Workflow
Natalia Madrueño | Jose Walter Hernández Pérez | Rubén R. Fernández | Soto Montalvo
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
Natalia Madrueño | Jose Walter Hernández Pérez | Rubén R. Fernández | Soto Montalvo
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
TNM cancer staging is a critical process for characterizing tumor burden and guiding clinical decisions. Nevertheless, its automated extraction remains challenging due to the unstructured and heterogeneous nature of free-text pathology reports. This paper describes the participation of the URJC-Team in Task 6 of the Social Media Mining for Health/Health Real-World Data (#SMM4H-HeaRD) 2026 Shared Tasks. It focuses on predicting TNM staging from pathology reports. The proposed workflow combines hand-crafted regular expressions with a Large Language Model (LLM). First, explicit TNM mentions are extracted using rule-based patterns. Then, any stage not recovered by these rules is inferred by an LLM. Overall, the proposal provides competitive results across all official shared-task phases.
Reliable Automated Triage in Spanish Clinical Notes: A Hybrid Framework for Risk-Aware HIV Suspicion Identification
Rodrigo Morales-Sánchez | Soto Montalvo | Raquel Martínez
BioNLP 2026
Rodrigo Morales-Sánchez | Soto Montalvo | Raquel Martínez
BioNLP 2026
Standard clinical Natural Language Processing (NLP) benchmarks often yield inflated metrics by forcing deterministic classification on ambiguous instances, thereby obscuring the clinical risks of overconfident predictions. To bridge this gap, we propose a risk-aware hybrid selective classification framework, evaluated on early Human Immunodeficiency Virus suspicion identification in Spanish clinical notes. Our dual-verification approach explicitly decouples aleatoric uncertainty through Mondrian conformal prediction and epistemic uncertainty using a Multi-Centroid Mahalanobis Distance veto. Empirical evaluations reveal that standard uncertainty metrics and baseline classifiers are structurally insufficient for safe medical triage, suffering severe coverage collapse when forced to operate under strict reliability constraints. In contrast, by demanding that clinical narratives pass both probabilistic and geometric safeguards, the proposed framework successfully isolates a highly trustworthy operational domain.The obtained results show that explicit, decoupled uncertainty quantification is essential for translating biomedical NLP into responsible clinical practice.
2014
A Data Driven Approach for Person Name Disambiguation in Web Search Results
Agustín D. Delgado | Raquel Martínez | Víctor Fresno | Soto Montalvo
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
Agustín D. Delgado | Raquel Martínez | Víctor Fresno | Soto Montalvo
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
2006
Multilingual Document Clustering: An Heuristic Approach Based on Cognate Named Entities
Soto Montalvo | Raquel Martínez | Arantza Casillas | Víctor Fresno
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
Soto Montalvo | Raquel Martínez | Arantza Casillas | Víctor Fresno
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics