Carlos Rodríguez Limón
2025
Terminology Enhanced Retrieval Augmented Generation for Spanish Legal Corpora
Patricia Martín Chozas
|
Pablo Calleja
|
Carlos Rodríguez Limón
Proceedings of the 5th Conference on Language, Data and Knowledge
49 This paper intends to highlight the importance of reusing terminologies in the context of Large Language Models (LLMs), particularly within a Retrieval-Augmented Generation (RAG) scenario. We explore the application of query expansion techniques using a controlled terminology enriched with synonyms. Our case study focuses on the Spanish legal domain, investigating both query expansion and improvements in retrieval effectiveness within the RAG model. The experimental setup includes various LLMs, such as Mistral, LLaMA3.2, and Granite 3, along with multiple Spanish-language embedding models. The results demonstrate that integrating current neural approaches with linguistic resources enhances RAG performance, reinforcing the role of structured lexical and terminological knowledge in modern NLP pipelines.