Abanoub Abdelmalak


2026

GeMTeX is a large-scale German Medical Text Corpus project with the goal to publish a clinical national reference corpus. The resource is currently under construction and comprises, as of February 2026, more than 15k clinical documents (20M tokens) from six German university hospitals. When building GeMTeX, attention was paid to comply with European regulatory requirements. In phase I, patients were asked to allow reuse of their clinical documents based on the legal foundation of an "informed consent". In phase II, consented documents from six major clinical sites in Germany underwent a thorough de-identification process. In phase III, we currently enrich this unlocked dataset with semantic information from the clinical domain. This annotation process is guided by Snomed CT, which supports to directly ground expressions within clinical documents in a worldwide shared medical documentation and ontology standard. The resource is currently under active development and is accessible upon request under controlled access conditions. We refer interested researchers to visit https://kiinformatik.mri.tum.de/en/gemtex or reach out via gemtex.mi@mh.tum.de.

2025

The PerAnsSumm Shared Task - CL4Health@NAACL 2025 aims to enhance healthcare community question-answering (CQA) by summarizing diverse user perspectives. It consists of two tasks: identifying and classifying perspective-specific spans (Task A) and generating structured, perspective-specific summaries from question-answer threads (Task B). The dataset used for this task is the PUMA dataset. For Task A, a COVID-Twitter-BERT model pre-trained on COVID-related text from Twitter was employed, improving the model’s understanding of relevant vocabulary and context. For Task B, LLaMA was utilized in a prompt-based fashion. The proposed approach achieved 9th place in Task A and 16th place overall, with the best proportional classification F1-score of 0.74.