Causal_QA.PT: A Human–LLM Co-Curated Benchmark for Causal Question Answering in Portuguese Language

Lia Furtado; Cíntia Araripe; Jocelani Castilhos; Lucas Holanda; Vládia Pinheiro

Causal_QA.PT: A Human–LLM Co-Curated Benchmark for Causal Question Answering in Portuguese Language

Lia Furtado, Cíntia Araripe, Jocelani Castilhos, Lucas Holanda, Vladia Pinheiro

Abstract

We present Causal_QA.PT, a human–LLM co-curated benchmark for causal question answering in Portuguese, addressing the lack of high-quality evaluation resources for causal reasoning in non-English languages. The dataset is developed through a hybrid human–LLM process with targeted generation, validation, and evaluation procedures, and is organized according to the PEARL causal typology. Using this resource, we evaluate the ability of Large Language Models to answer causal questions in Portuguese and examine the role of explicitly providing causal class information in prompt design. Our findings show that current LLMs are capable of producing high-quality causal responses in Portuguese, with GPT-5 Mini in particular demonstrating strong performance in judgment-based evaluation. Explicit causal class information yields model- and question-dependent benefits, particularly for interventional and counterfactual questions. Finally, we observe that human reference answers are not always superior, underscoring the importance of careful benchmark curation and robust evaluation for underrepresented languages.

Anthology ID:: 2026.propor-1.65
Volume:: Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Month:: April
Year:: 2026
Address:: Salvador, Brazil
Editors:: Marlo Souza, Iria de-Dios-Flores, Diana Santos, Larissa Freitas, Jackson Wilke da Cruz Souza, Eugénio Ribeiro
Venue:: PROPOR
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 657–666
Language:
URL:: https://preview.aclanthology.org/ingest-dnd/2026.propor-1.65/
DOI:
Bibkey:
Cite (ACL):: Lia Furtado, Cíntia Araripe, Jocelani Castilhos, Lucas Holanda, and Vladia Pinheiro. 2026. Causal_QA.PT: A Human–LLM Co-Curated Benchmark for Causal Question Answering in Portuguese Language. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1, pages 657–666, Salvador, Brazil. Association for Computational Linguistics.
Cite (Informal):: Causal_QA.PT: A Human–LLM Co-Curated Benchmark for Causal Question Answering in Portuguese Language (Furtado et al., PROPOR 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-dnd/2026.propor-1.65.pdf

PDF Cite Search Fix data