Retrieval-based Evaluation for LLMs: A Case Study in Korean Legal QA

Cheol Ryu; Seolhwa Lee; Subeen Pang; Chanyeol Choi; Hojun Choi; Myeonggee Min; Jy-Yong Sohn

doi:10.18653/v1/2023.nllp-1.13

Retrieval-based Evaluation for LLMs: A Case Study in Korean Legal QA

Cheol Ryu, Seolhwa Lee, Subeen Pang, Chanyeol Choi, Hojun Choi, Myeonggee Min, Jy-Yong Sohn

Abstract

While large language models (LLMs) have demonstrated significant capabilities in text generation, their utilization in areas requiring domain-specific expertise, such as law, must be approached cautiously. This caution is warranted due to the inherent challenges associated with LLM-generated texts, including the potential presence of factual errors. Motivated by this issue, we propose Eval-RAG, a new evaluation method for LLM-generated texts. Unlike existing methods, Eval-RAG evaluates the validity of generated texts based on the related document that are collected by the retriever. In other words, Eval-RAG adopts the idea of retrieval augmented generation (RAG) for the purpose of evaluation. Our experimental results on Korean Legal Question-Answering (QA) tasks show that conventional LLM-based evaluation methods can be better aligned with Lawyers’ evaluations, by combining with Eval-RAG. In addition, our qualitative analysis show that Eval-RAG successfully finds the factual errors in LLM-generated texts, while existing evaluation methods cannot.

Anthology ID:: 2023.nllp-1.13
Volume:: Proceedings of the Natural Legal Language Processing Workshop 2023
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Daniel Preoțiuc-Pietro, Catalina Goanta, Ilias Chalkidis, Leslie Barrett, Gerasimos Spanakis, Nikolaos Aletras
Venues:: NLLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 132–137
Language:
URL:: https://aclanthology.org/2023.nllp-1.13
DOI:: 10.18653/v1/2023.nllp-1.13
Bibkey:
Cite (ACL):: Cheol Ryu, Seolhwa Lee, Subeen Pang, Chanyeol Choi, Hojun Choi, Myeonggee Min, and Jy-Yong Sohn. 2023. Retrieval-based Evaluation for LLMs: A Case Study in Korean Legal QA. In Proceedings of the Natural Legal Language Processing Workshop 2023, pages 132–137, Singapore. Association for Computational Linguistics.
Cite (Informal):: Retrieval-based Evaluation for LLMs: A Case Study in Korean Legal QA (Ryu et al., NLLP-WS 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/add_acl24_videos/2023.nllp-1.13.pdf

PDF Search