Tudor Groza


2024

pdf
REAL: A Retrieval-Augmented Entity Linking Approach for Biomedical Concept Recognition
Darya Shlyk | Tudor Groza | Marco Mesiti | Stefano Montanelli | Emanuele Cavalleri
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing

Large Language Models (LLMs) offer an appealing alternative to training dedicated models for many Natural Language Processing (NLP) tasks. However, outdated knowledge and hallucination issues can be major obstacles in their application in knowledge-intensive biomedical scenarios. In this study, we consider the task of biomedical concept recognition (CR) from unstructured scientific literature and explore the use of Retrieval Augmented Generation (RAG) to improve accuracy and reliability of the LLM-based biomedical CR. Our approach, named REAL (Retrieval Augmented Entity Linking), combines the generative capabilities of LLMs with curated knowledge bases to automatically annotate natural language texts with concepts from bio-ontologies. By applying REAL to benchmark corpora on phenotype concept recognition, we show its effectiveness in improving LLM-based CR performance. This research highlights the potential of combining LLMs with external knowledge sources to advance biomedical text processing.

2016

pdf
Building a dictionary of lexical variants for phenotype descriptors
Simon Kocbek | Tudor Groza
Proceedings of the 15th Workshop on Biomedical Natural Language Processing

pdf
Evaluating a dictionary of human phenotype terms focusing on rare diseases
Simon Kocbek | Toyofumi Fujiwara | Jin-Dong Kim | Toshihisa Takagi | Tudor Groza
Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)

Annotating medical text such as clinical notes with human phenotype descriptors is an important task that can, for example, assist in building patient profiles. To automatically annotate text one usually needs a dictionary of predefined terms. However, do to the variety of human expressiveness, current state-of-the art phenotype concept recognizers and automatic annotators struggle with specific domain issues and challenges. In this paper we present results of an-notating gold standard corpus with a dictionary containing lexical variants for the Human Phenotype Ontology terms. The main purpose of the dictionary is to improve the recall of phenotype concept recognition systems. We compare the method with four other approaches and present results.

2015

pdf
Similarity Metrics for Clustering PubMed Abstracts for Evidence Based Medicine
Hamed Hassanzadeh | Diego Mollá | Tudor Groza | Anthony Nguyen | Jane Hunter
Proceedings of the Australasian Language Technology Association Workshop 2015

pdf
UQeResearch: Semantic Textual Similarity Quantification
Hamed Hassanzadeh | Tudor Groza | Anthony Nguyen | Jane Hunter
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf
Automated Generation of Test Suites for Error Analysis of Concept Recognition Systems
Tudor Groza | Karin Verspoor
Proceedings of the Australasian Language Technology Association Workshop 2014