LMentry: A Language Model Benchmark of Elementary Language Tasks

Avia Efrat; Or Honovich; Omer Levy

doi:10.18653/v1/2023.findings-acl.666

LMentry: A Language Model Benchmark of Elementary Language Tasks

Abstract

As the performance of large language models rapidly improves, benchmarks are getting larger and more complex as well.We present LMentry, a benchmark that avoids this “arms race” by focusing on a compact set of tasks that are trivial to humans, e.g. writing a sentence containing a specific word, identifying which words in a list belong to a specific category, or choosing which of two words is longer.LMentry is specifically designed to provide quick and interpretable insights into the capabilities and robustness of large language models.Our experiments reveal a wide variety of failure cases that, while immediately obvious to humans, pose a considerable challenge for large language models, including OpenAI’s latest 175B-parameter instruction-tuned model, TextDavinci002.LMentry complements contemporary evaluation approaches of large language models, providing a quick, automatic, and easy-to-run “unit test”, without resorting to large benchmark suites of complex tasks.

Anthology ID:: 2023.findings-acl.666
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10476–10501
Language:
URL:: https://aclanthology.org/2023.findings-acl.666
DOI:: 10.18653/v1/2023.findings-acl.666
Bibkey:
Cite (ACL):: Avia Efrat, Or Honovich, and Omer Levy. 2023. LMentry: A Language Model Benchmark of Elementary Language Tasks. In Findings of the Association for Computational Linguistics: ACL 2023, pages 10476–10501, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: LMentry: A Language Model Benchmark of Elementary Language Tasks (Efrat et al., Findings 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/remove-xml-comments/2023.findings-acl.666.pdf

PDF Search