Automated Question-Answer Generation for Evaluating RAG-based Chatbots

Juan José González Torres; Mihai Bogdan Bîndilă; Sebastiaan Hofstee; Daniel Szondy; Quang-Hung Nguyen; Shenghui Wang; Gwenn Englebienne

Automated Question-Answer Generation for Evaluating RAG-based Chatbots

Juan José González Torres, Mihai Bogdan Bîndilă, Sebastiaan Hofstee, Daniel Szondy, Quang-Hung Nguyen, Shenghui Wang, Gwenn Englebienne

Abstract

In this research, we propose a framework to generate human-like question-answer pairs with long or factoid answers automatically and, based on them, automatically evaluate the quality of Retrieval-Augmented Generation (RAG). Our framework can also create datasets that assess hallucination levels of Large Language Models (LLMs) by simulating unanswerable questions. We then apply the framework to create a dataset of question-answer (QA) pairs based on more than 1,000 leaflets about the medical and administrative procedures of a hospital. The dataset was evaluated by hospital specialists, who confirmed that more than 50% of the QA pairs are applicable. Finally, we show that our framework can be used to evaluate LLM performance by using Llama-2-13B fine-tuned in Dutch (Vanroy, 2023) with the generated dataset, and show the method’s use in testing models with regard to answering unanswerable and factoid questions appears promising.

Anthology ID:: 2024.cl4health-1.25
Volume:: Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Dina Demner-Fushman, Sophia Ananiadou, Paul Thompson, Brian Ondov
Venues:: CL4Health | WS
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 204–214
Language:
URL:: https://aclanthology.org/2024.cl4health-1.25
DOI:
Bibkey:
Cite (ACL):: Juan José González Torres, Mihai Bogdan Bîndilă, Sebastiaan Hofstee, Daniel Szondy, Quang-Hung Nguyen, Shenghui Wang, and Gwenn Englebienne. 2024. Automated Question-Answer Generation for Evaluating RAG-based Chatbots. In Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024, pages 204–214, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Automated Question-Answer Generation for Evaluating RAG-based Chatbots (González Torres et al., CL4Health-WS 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2024.cl4health-1.25.pdf

PDF Search