Pedro Kroll
2026
ESG-QA: Building a Dataset for Question Answering on Environmental, Social, and Governance Pillars
Gabriel Assis | Ayrton Surica | Pedro Kroll | Gabriela Aires Mendes | Darian Rabbani | Edson Bollis | Lucas Francisco Amaral Orosco Pellicer | Aline Paes
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Gabriel Assis | Ayrton Surica | Pedro Kroll | Gabriela Aires Mendes | Darian Rabbani | Edson Bollis | Lucas Francisco Amaral Orosco Pellicer | Aline Paes
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Environmental, Social, and Governance (ESG) factors are becoming increasingly central to corporate accountability and sustainable development. However, benchmarks for evaluating large language models (LLMs) in this domain remain scarce. To alleviate this gap, we present ESG-QA, a dataset of 87,261 question–answer–context triplets spanning the three ESG pillars. ESG-QA was built using an LLM-based Question Answer (QA) generation pipeline, enhanced through rule-based and semantic filtering, and validated by human inspection, enabling both abstractive QA and retrieval-augmented setups. We benchmark three open-weight LLM families (Llama-3, Gemma-3, and Qwen-3) across multiple dimensions, including correctness, environmental impact, and readability. Results show that Qwen-3 with retrieval achieves the highest absolute QA performance, while Gemma-3 provides the strongest overall balance between correctness, efficiency, and clarity. By releasing ESG-QA and its generation framework, this work establishes a comprehensive benchmark for advancing ESG-oriented QA and promoting more transparent and responsible AI evaluation.