Pablo Calleja
2025
Terminology Enhanced Retrieval Augmented Generation for Spanish Legal Corpora
Patricia Martín Chozas
|
Pablo Calleja
|
Carlos Rodríguez Limón
Proceedings of the 5th Conference on Language, Data and Knowledge
49 This paper intends to highlight the importance of reusing terminologies in the context of Large Language Models (LLMs), particularly within a Retrieval-Augmented Generation (RAG) scenario. We explore the application of query expansion techniques using a controlled terminology enriched with synonyms. Our case study focuses on the Spanish legal domain, investigating both query expansion and improvements in retrieval effectiveness within the RAG model. The experimental setup includes various LLMs, such as Mistral, LLaMA3.2, and Granite 3, along with multiple Spanish-language embedding models. The results demonstrate that integrating current neural approaches with linguistic resources enhances RAG performance, reinforcing the role of structured lexical and terminological knowledge in modern NLP pipelines.
2019
RGCL-WLV at SemEval-2019 Task 12: Toponym Detection
Alistair Plum
|
Tharindu Ranasinghe
|
Pablo Calleja
|
Constantin Orăsan
|
Ruslan Mitkov
Proceedings of the 13th International Workshop on Semantic Evaluation
This article describes the system submitted by the RGCL-WLV team to the SemEval 2019 Task 12: Toponym resolution in scientific papers. The system detects toponyms using a bootstrapped machine learning (ML) approach which classifies names identified using gazetteers extracted from the GeoNames geographical database. The paper evaluates the performance of several ML classifiers, as well as how the gazetteers influence the accuracy of the system. Several runs were submitted. The highest precision achieved for one of the submissions was 89%, albeit it at a relatively low recall of 49%.
2017
Role-based model for Named Entity Recognition
Pablo Calleja
|
Raúl García-Castro
|
Guadalupe Aguado-de-Cea
|
Asunción Gómez-Pérez
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Named Entity Recognition (NER) poses new challenges in real-world documents in which there are entities with different roles according to their purpose or meaning. Retrieving all the possible entities in scenarios in which only a subset of them based on their role is needed, produces noise on the overall precision. This work proposes a NER model that relies on role classification models that support recognizing entities with a specific role. The proposed model has been implemented in two use cases using Spanish drug Summary of Product Characteristics: identification of therapeutic indications and identification of adverse reactions. The results show how precision is increased using a NER model that is oriented towards a specific role and discards entities out of scope.