Automated Question-Answer Generation for Evaluating RAG-based Chatbots
Juan José González Torres, Mihai Bogdan Bîndilă, Sebastiaan Hofstee, Daniel Szondy, Quang-Hung Nguyen, Shenghui Wang, Gwenn Englebienne
Abstract
In this research, we propose a framework to generate human-like question-answer pairs with long or factoid answers automatically and, based on them, automatically evaluate the quality of Retrieval-Augmented Generation (RAG). Our framework can also create datasets that assess hallucination levels of Large Language Models (LLMs) by simulating unanswerable questions. We then apply the framework to create a dataset of question-answer (QA) pairs based on more than 1,000 leaflets about the medical and administrative procedures of a hospital. The dataset was evaluated by hospital specialists, who confirmed that more than 50% of the QA pairs are applicable. Finally, we show that our framework can be used to evaluate LLM performance by using Llama-2-13B fine-tuned in Dutch (Vanroy, 2023) with the generated dataset, and show the method’s use in testing models with regard to answering unanswerable and factoid questions appears promising.- Anthology ID:
- 2024.cl4health-1.25
- Volume:
- Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Dina Demner-Fushman, Sophia Ananiadou, Paul Thompson, Brian Ondov
- Venues:
- CL4Health | WS
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 204–214
- Language:
- URL:
- https://aclanthology.org/2024.cl4health-1.25
- DOI:
- Cite (ACL):
- Juan José González Torres, Mihai Bogdan Bîndilă, Sebastiaan Hofstee, Daniel Szondy, Quang-Hung Nguyen, Shenghui Wang, and Gwenn Englebienne. 2024. Automated Question-Answer Generation for Evaluating RAG-based Chatbots. In Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024, pages 204–214, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Automated Question-Answer Generation for Evaluating RAG-based Chatbots (González Torres et al., CL4Health-WS 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.cl4health-1.25.pdf