Adaptively profiling models with task elicitation

Davis Brown; Prithvi Balehannina; Helen Jin; Shreya Havaldar; Hamed Hassani; Eric Wong

Adaptively profiling models with task elicitation

Davis Brown, Prithvi Balehannina, Helen Jin, Shreya Havaldar, Hamed Hassani, Eric Wong

Abstract

Language model evaluations often fail to characterize consequential failure modes, forcing experts to inspect outputs and build new benchmarks. We introduce task elicitation, a method that automatically builds new evaluations to profile model behavior. Task elicitation finds hundreds of natural-language tasks—an order of magnitude more than prior work—where frontier models exhibit systematic failures, in domains ranging from forecasting to online harassment. For example, we find that Sonnet 3.5 over-associates quantum computing and AGI and that o3-mini is prone to hallucination when fabrications are repeated in-context.

Anthology ID:: 2025.emnlp-main.1270
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 24996–25031
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1270/
DOI:
Bibkey:
Cite (ACL):: Davis Brown, Prithvi Balehannina, Helen Jin, Shreya Havaldar, Hamed Hassani, and Eric Wong. 2025. Adaptively profiling models with task elicitation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 24996–25031, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Adaptively profiling models with task elicitation (Brown et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1270.pdf
Checklist:: 2025.emnlp-main.1270.checklist.pdf

PDF Cite Search Checklist Fix data