LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation
Eunsu Kim, Juyoung Suk, Seungone Kim, Niklas Muennighoff, Dongkwan Kim, Alice Oh
Abstract
We introduce LLM-as-an-Interviewer, a novel paradigm for evaluating large language models (LLMs). This approach leverages multi-turn interactions where the LLM interviewer actively provides feedback on responses and poses follow-up questions to the evaluated LLM. At the start of the interview, the LLM interviewer dynamically modifies datasets to generate initial questions, mitigating data contamination. We apply the LLM-as-an-Interviewer framework to evaluate six models on the reasoning, factuality and instruction-following tasks. Our results show that the framework effectively provides insights into LLM performance, including the quality of initial responses, adaptability to feedback, and ability to address follow-up queries like clarification or additional knowledge requests. The framework also addresses key limitations of conventional methods like LLM-as-a-Judge, including verbosity bias and inconsistency across runs. Finally, we propose the Interview Report, which aggregates insights from the interview process, providing examples and a comprehensive analysis of the LLM’s strengths and weaknesses. This report offers a detailed snapshot of the model’s real-world applicability.- Anthology ID:
- 2025.findings-acl.1357
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2025
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 26456–26493
- Language:
- URL:
- https://preview.aclanthology.org/transition-to-people-yaml/2025.findings-acl.1357/
- DOI:
- 10.18653/v1/2025.findings-acl.1357
- Cite (ACL):
- Eunsu Kim, Juyoung Suk, Seungone Kim, Niklas Muennighoff, Dongkwan Kim, and Alice Oh. 2025. LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 26456–26493, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation (Kim et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/transition-to-people-yaml/2025.findings-acl.1357.pdf