Specializing a Small Language Model for Closed-Domain Portuguese RAG using Knowledge Graph Supervision

Josué Caldas; Elvis de Souza; Patrícia Silva; Marco Pacheco

Specializing a Small Language Model for Closed-Domain Portuguese RAG using Knowledge Graph Supervision

Josué Caldas, Elvis de Souza, Patrícia Silva, Marco Pacheco

Abstract

Fine-tuned small language models (SLMs) have emerged as effective alternatives for closed-domain tasks, where large language models (LLMs) often lack sufficient parametric knowledge. This study presents a methodology for adapting a small language model to a closed-domain question answering (Q A) task. For each question, the model is trained to output an answer based on the most relevant context passage, among ten provided candidates, thus reproducing the logic of a Retrieval-Augmented Generation (RAG) framework. The fine-tuning data were derived from PetroKGraph, an existing knowledge graph built from Portuguese-language resources in the oil and gas (O G) domain. Experimental results show that the fine-tuned model achieves a 20 percentage points accuracy improvement over the base model on closed-domain questions. It also surpasses GPT-4o and GPT-4o Mini by 12 and 25 points, respectively. Moreover, its performance on general-domain tasks remains comparable to that of the base model, indicating that the specialized model effectively learned domain specific knowledge while maintaining general reasoning capabilities.

Anthology ID:: 2026.propor-1.28
Volume:: Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Month:: April
Year:: 2026
Address:: Salvador, Brazil
Editors:: Marlo Souza, Iria de-Dios-Flores, Diana Santos, Larissa Freitas, Jackson Wilke da Cruz Souza, Eugénio Ribeiro
Venue:: PROPOR
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 281–290
Language:
URL:: https://preview.aclanthology.org/ingest-dnd/2026.propor-1.28/
DOI:
Bibkey:
Cite (ACL):: Josué Caldas, Elvis de Souza, Patrícia Silva, and Marco Pacheco. 2026. Specializing a Small Language Model for Closed-Domain Portuguese RAG using Knowledge Graph Supervision. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1, pages 281–290, Salvador, Brazil. Association for Computational Linguistics.
Cite (Informal):: Specializing a Small Language Model for Closed-Domain Portuguese RAG using Knowledge Graph Supervision (Caldas et al., PROPOR 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-dnd/2026.propor-1.28.pdf

PDF Cite Search Fix data