LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation

Eunsu Kim, Juyoung Suk, Seungone Kim, Niklas Muennighoff, Dongkwan Kim, Alice Oh


Abstract
We introduce LLM-as-an-Interviewer, a novel paradigm for evaluating large language models (LLMs). This approach leverages multi-turn interactions where the LLM interviewer actively provides feedback on responses and poses follow-up questions to the evaluated LLM. At the start of the interview, the LLM interviewer dynamically modifies datasets to generate initial questions, mitigating data contamination. We apply the LLM-as-an-Interviewer framework to evaluate six models on the reasoning, factuality and instruction-following tasks. Our results show that the framework effectively provides insights into LLM performance, including the quality of initial responses, adaptability to feedback, and ability to address follow-up queries like clarification or additional knowledge requests. The framework also addresses key limitations of conventional methods like LLM-as-a-Judge, including verbosity bias and inconsistency across runs. Finally, we propose the Interview Report, which aggregates insights from the interview process, providing examples and a comprehensive analysis of the LLM’s strengths and weaknesses. This report offers a detailed snapshot of the model’s real-world applicability.
Anthology ID:
2025.findings-acl.1357
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26456–26493
Language:
URL:
https://preview.aclanthology.org/transition-to-people-yaml/2025.findings-acl.1357/
DOI:
10.18653/v1/2025.findings-acl.1357
Bibkey:
Cite (ACL):
Eunsu Kim, Juyoung Suk, Seungone Kim, Niklas Muennighoff, Dongkwan Kim, and Alice Oh. 2025. LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 26456–26493, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation (Kim et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/transition-to-people-yaml/2025.findings-acl.1357.pdf