LARI Dataset: A Native Portuguese Question Answering Dataset from Brasileiras em PLN
Júlia da Rocha Junqueira, Larissa A. de Freitas, Ulisses Brisolara Corrêa
Abstract
Recent advances in the field have revolutionized Question and Answering (QA). However, for languages like Portuguese, progress is often hindered by the lack of native training resources. To address this gap, this paper introduces LARI, a new dataset designed to benchmark and enhance QA in Portuguese. Our methodology combines the capabilities of the Sabiá-7B model, fine-tuned via QLoRA on a domain-specific corpus, with human validation. We utilized the book Natural Language Processing – Concepts, Techniques, and Applications in Portuguese (2nd Edition), as a case study for content extraction. The generated instances underwent expert human evaluation, achieving an average quality score of 4.47 out of 5. The final dataset, comprising 464 context-question-answer triples, is made publicly available to the community, offering a valuable resource for future research in low-resource settings.- Anthology ID:
- 2026.propor-1.110
- Volume:
- Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
- Month:
- April
- Year:
- 2026
- Address:
- Salvador, Brazil
- Editors:
- Marlo Souza, Iria de-Dios-Flores, Diana Santos, Larissa Freitas, Jackson Wilke da Cruz Souza, Eugénio Ribeiro
- Venue:
- PROPOR
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1055–1061
- Language:
- URL:
- https://preview.aclanthology.org/ingest-dnd/2026.propor-1.110/
- DOI:
- Cite (ACL):
- Júlia da Rocha Junqueira, Larissa A. de Freitas, and Ulisses Brisolara Corrêa. 2026. LARI Dataset: A Native Portuguese Question Answering Dataset from Brasileiras em PLN. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1, pages 1055–1061, Salvador, Brazil. Association for Computational Linguistics.
- Cite (Informal):
- LARI Dataset: A Native Portuguese Question Answering Dataset from Brasileiras em PLN (Junqueira et al., PROPOR 2026)
- PDF:
- https://preview.aclanthology.org/ingest-dnd/2026.propor-1.110.pdf