Terminology Enhanced Retrieval Augmented Generation for Spanish Legal Corpora
Patricia Martín Chozas, Pablo Calleja, Carlos Rodríguez Limón
Abstract
49 This paper intends to highlight the importance of reusing terminologies in the context of Large Language Models (LLMs), particularly within a Retrieval-Augmented Generation (RAG) scenario. We explore the application of query expansion techniques using a controlled terminology enriched with synonyms. Our case study focuses on the Spanish legal domain, investigating both query expansion and improvements in retrieval effectiveness within the RAG model. The experimental setup includes various LLMs, such as Mistral, LLaMA3.2, and Granite 3, along with multiple Spanish-language embedding models. The results demonstrate that integrating current neural approaches with linguistic resources enhances RAG performance, reinforcing the role of structured lexical and terminological knowledge in modern NLP pipelines.- Anthology ID:
- 2025.ldk-1.16
- Volume:
- Proceedings of the 5th Conference on Language, Data and Knowledge
- Month:
- September
- Year:
- 2025
- Address:
- Naples, Italy
- Editors:
- Mehwish Alam, Andon Tchechmedjiev, Jorge Gracia, Dagmar Gromann, Maria Pia di Buono, Johanna Monti, Maxim Ionov
- Venues:
- LDK | WS
- SIG:
- Publisher:
- Unior Press
- Note:
- Pages:
- 147–152
- Language:
- URL:
- https://preview.aclanthology.org/ldl-25-ingestion/2025.ldk-1.16/
- DOI:
- Cite (ACL):
- Patricia Martín Chozas, Pablo Calleja, and Carlos Rodríguez Limón. 2025. Terminology Enhanced Retrieval Augmented Generation for Spanish Legal Corpora. In Proceedings of the 5th Conference on Language, Data and Knowledge, pages 147–152, Naples, Italy. Unior Press.
- Cite (Informal):
- Terminology Enhanced Retrieval Augmented Generation for Spanish Legal Corpora (Martín Chozas et al., LDK 2025)
- PDF:
- https://preview.aclanthology.org/ldl-25-ingestion/2025.ldk-1.16.pdf