WeQA: A Benchmark for Retrieval Augmented Generation in Wind Energy Domain

Rounak Meyur; Hung Phan; Sridevi Wagle; Jan Strube; Mahantesh Halappanavar; Sameera Horawalavithana; Anurag Acharya; Sai Munikoti

doi:10.18653/v1/2025.nlp4pi-1.20

WeQA: A Benchmark for Retrieval Augmented Generation in Wind Energy Domain

Rounak Meyur, Hung Phan, Sridevi Wagle, Jan Strube, Mahantesh Halappanavar, Sameera Horawalavithana, Anurag Acharya, Sai Munikoti

Abstract

Wind energy project assessments present significant challenges for decision-makers, who must navigate and synthesize hundreds of pages of environmental and scientific documentation. These documents often span different regions and project scales, covering multiple domains of expertise. This process traditionally demands immense time and specialized knowledge from decision-makers. The advent of Large Language Models (LLM) and Retrieval Augmented Generation (RAG) approaches offer a transformative solution, enabling rapid, accurate cross-document information retrieval and synthesis. As the landscape of Natural Language Processing (NLP) and text generation continues to evolve, benchmarking becomes essential to evaluate and compare the performance of different RAG-based LLMs. In this paper, we present a comprehensive framework to generate a domain relevant RAG benchmark. Our framework is based on automatic question-answer generation with Human (domain experts)-AI (LLM) teaming. As a case study, we demonstrate the framework by introducing WeQA, a first-of-its-kind benchmark on the wind energy domain which comprises of multiple scientific documents/reports related to environmental aspects of wind energy projects. Our framework systematically evaluates RAG performance using diverse metrics and multiple question types with varying complexity level, providing a foundation for rigorous assessment of RAG-based systems in complex scientific domains and enabling researchers to identify areas for improvement in domain-specific applications.

Anthology ID:: 2025.nlp4pi-1.20
Volume:: Proceedings of the Fourth Workshop on NLP for Positive Impact (NLP4PI)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Katherine Atwell, Laura Biester, Angana Borah, Daryna Dementieva, Oana Ignat, Neema Kotonya, Ziyi Liu, Ruyuan Wan, Steven Wilson, Jieyu Zhao
Venues:: NLP4PI | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 239–251
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.nlp4pi-1.20/
DOI:: 10.18653/v1/2025.nlp4pi-1.20
Bibkey:
Cite (ACL):: Rounak Meyur, Hung Phan, Sridevi Wagle, Jan Strube, Mahantesh Halappanavar, Sameera Horawalavithana, Anurag Acharya, and Sai Munikoti. 2025. WeQA: A Benchmark for Retrieval Augmented Generation in Wind Energy Domain. In Proceedings of the Fourth Workshop on NLP for Positive Impact (NLP4PI), pages 239–251, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: WeQA: A Benchmark for Retrieval Augmented Generation in Wind Energy Domain (Meyur et al., NLP4PI 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.nlp4pi-1.20.pdf

PDF Cite Search Fix data