Pseudointelligence: A Unifying Lens on Language Model Evaluation

Shikhar Murty, Orr Paradise, Pratyusha Sharma


Abstract
With large language models surpassing human performance on an increasing number of benchmarks, we must take a principled approach for targeted evaluation of model capabilities. Inspired by pseudorandomness, we propose pseudointelligence, which captures the maxim that “(perceived) intelligence lies in the eye of the beholder.” That is, that claims of intelligence are meaningful only when their evaluator is taken into account. Concretely, we propose a complexity-theoretic framework of model evaluation cast as a dynamic interaction between a model and a learned evaluator. We demonstrate that this framework can be used to reason about two case studies in language model evaluation, as well as analyze existing evaluation methods.
Anthology ID:
2023.findings-emnlp.485
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7284–7290
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.485
DOI:
10.18653/v1/2023.findings-emnlp.485
Bibkey:
Cite (ACL):
Shikhar Murty, Orr Paradise, and Pratyusha Sharma. 2023. Pseudointelligence: A Unifying Lens on Language Model Evaluation. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7284–7290, Singapore. Association for Computational Linguistics.
Cite (Informal):
Pseudointelligence: A Unifying Lens on Language Model Evaluation (Murty et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2023.findings-emnlp.485.pdf