Moran Mizrahi
2024
State of What Art? A Call for Multi-Prompt LLM Evaluation
Moran Mizrahi
|
Guy Kaplan
|
Dan Malkin
|
Rotem Dror
|
Dafna Shahaf
|
Gabriel Stanovsky
Transactions of the Association for Computational Linguistics, Volume 12
Recent advances in LLMs have led to an abundance of evaluation benchmarks, which typically rely on a single instruction template per task. We create a large-scale collection of instruction paraphrases and comprehensively analyze the brittleness introduced by single-prompt evaluations across 6.5M instances, involving 20 different LLMs and 39 tasks from 3 benchmarks. We find that different instruction templates lead to very different performance, both absolute and relative. Instead, we propose a set of diverse metrics on multiple instruction paraphrases, specifically tailored for different use cases (e.g., LLM vs. downstream development), ensuring a more reliable and meaningful assessment of LLM capabilities. We show that our metrics provide new insights into the strengths and limitations of current LLMs.
2020
Coming to Terms: Automatic Formation of Neologisms in Hebrew
Moran Mizrahi
|
Stav Yardeni Seelig
|
Dafna Shahaf
Findings of the Association for Computational Linguistics: EMNLP 2020
Spoken languages are ever-changing, with new words entering them all the time. However, coming up with new words (neologisms) today relies exclusively on human creativity. In this paper we propose a system to automatically suggest neologisms. We focus on the Hebrew language as a test case due to the unusual regularity of its noun formation. User studies comparing our algorithm to experts and non-experts demonstrate that our algorithm is capable of generating high-quality outputs, as well as enhance human creativity. More broadly, we seek to inspire more computational work around the topic of linguistic creativity, which we believe offers numerous unexplored opportunities.
Search
Co-authors
- Dafna Shahaf 2
- Dan Malkin 1
- Gabriel Stanovsky 1
- Guy Kaplan 1
- Rotem Dror 1
- show all...