Moran Mizrahi
2026
🧑🍳 Cooking Up Creativity: Enhancing LLM Creativity through Structured Recombination
Moran Mizrahi | Chen Shani | Gabriel Stanovsky | Dan Jurafsky | Dafna Shahaf
Transactions of the Association for Computational Linguistics, Volume 14
Moran Mizrahi | Chen Shani | Gabriel Stanovsky | Dan Jurafsky | Dafna Shahaf
Transactions of the Association for Computational Linguistics, Volume 14
Large Language Models (LLMs) excel at many tasks, yet they struggle to produce truly creative, diverse ideas. In this paper, we introduce a novel approach that enhances LLM creativity. We apply LLMs for translating between natural language and structured representations, and perform the core creative leap via cognitively inspired manipulations on these representations. Our notion of creativity goes beyond superficial token-level variations; rather, we recombine structured representations of existing ideas, enabling our system to effectively explore a more abstract landscape of ideas. We demonstrate our approach in the culinary domain with DishCover, a model that generates creative recipes. Experiments and domain-expert evaluations reveal that our outputs, which are mostly coherent and feasible, significantly surpass GPT-4o in terms of novelty and diversity, thus outperforming it in creative generation. We hope our work inspires further research into structured creativity in AI.
2024
State of What Art? A Call for Multi-Prompt LLM Evaluation
Moran Mizrahi | Guy Kaplan | Dan Malkin | Rotem Dror | Dafna Shahaf | Gabriel Stanovsky
Transactions of the Association for Computational Linguistics, Volume 12
Moran Mizrahi | Guy Kaplan | Dan Malkin | Rotem Dror | Dafna Shahaf | Gabriel Stanovsky
Transactions of the Association for Computational Linguistics, Volume 12
Recent advances in LLMs have led to an abundance of evaluation benchmarks, which typically rely on a single instruction template per task. We create a large-scale collection of instruction paraphrases and comprehensively analyze the brittleness introduced by single-prompt evaluations across 6.5M instances, involving 20 different LLMs and 39 tasks from 3 benchmarks. We find that different instruction templates lead to very different performance, both absolute and relative. Instead, we propose a set of diverse metrics on multiple instruction paraphrases, specifically tailored for different use cases (e.g., LLM vs. downstream development), ensuring a more reliable and meaningful assessment of LLM capabilities. We show that our metrics provide new insights into the strengths and limitations of current LLMs.
2020
Coming to Terms: Automatic Formation of Neologisms in Hebrew
Moran Mizrahi | Stav Yardeni Seelig | Dafna Shahaf
Findings of the Association for Computational Linguistics: EMNLP 2020
Moran Mizrahi | Stav Yardeni Seelig | Dafna Shahaf
Findings of the Association for Computational Linguistics: EMNLP 2020
Spoken languages are ever-changing, with new words entering them all the time. However, coming up with new words (neologisms) today relies exclusively on human creativity. In this paper we propose a system to automatically suggest neologisms. We focus on the Hebrew language as a test case due to the unusual regularity of its noun formation. User studies comparing our algorithm to experts and non-experts demonstrate that our algorithm is capable of generating high-quality outputs, as well as enhance human creativity. More broadly, we seek to inspire more computational work around the topic of linguistic creativity, which we believe offers numerous unexplored opportunities.