Automated Question-Answer Generation for Evaluating RAG-based Chatbots

Juan José González Torres, Mihai Bogdan Bîndilă, Sebastiaan Hofstee, Daniel Szondy, Quang-Hung Nguyen, Shenghui Wang, Gwenn Englebienne


Abstract
In this research, we propose a framework to generate human-like question-answer pairs with long or factoid answers automatically and, based on them, automatically evaluate the quality of Retrieval-Augmented Generation (RAG). Our framework can also create datasets that assess hallucination levels of Large Language Models (LLMs) by simulating unanswerable questions. We then apply the framework to create a dataset of question-answer (QA) pairs based on more than 1,000 leaflets about the medical and administrative procedures of a hospital. The dataset was evaluated by hospital specialists, who confirmed that more than 50% of the QA pairs are applicable. Finally, we show that our framework can be used to evaluate LLM performance by using Llama-2-13B fine-tuned in Dutch (Vanroy, 2023) with the generated dataset, and show the method’s use in testing models with regard to answering unanswerable and factoid questions appears promising.
Anthology ID:
2024.cl4health-1.25
Volume:
Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Paul Thompson, Brian Ondov
Venues:
CL4Health | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
204–214
Language:
URL:
https://aclanthology.org/2024.cl4health-1.25
DOI:
Bibkey:
Cite (ACL):
Juan José González Torres, Mihai Bogdan Bîndilă, Sebastiaan Hofstee, Daniel Szondy, Quang-Hung Nguyen, Shenghui Wang, and Gwenn Englebienne. 2024. Automated Question-Answer Generation for Evaluating RAG-based Chatbots. In Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024, pages 204–214, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Automated Question-Answer Generation for Evaluating RAG-based Chatbots (González Torres et al., CL4Health-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.cl4health-1.25.pdf