Raja Marjieh

2024

We explore the creative problem-solving capabilities of modern LLMs in a novel constrained setting. To this end, we create MACGYVER, an automatically generated dataset consisting of over 1,600 real-world problems deliberately designed to trigger innovative usage of objects and necessitate out-of-the-box thinking. We then present our collection to both LLMs and humans to compare and contrast their problem-solving abilities. MACGYVER is challenging for both groups, but in unique and complementary ways. For instance, humans excel in tasks they are familiar with but struggle with domain-specific knowledge, leading to a higher variance. In contrast, LLMs, exposed to a variety of specialized knowledge, attempt broader problems but fail by proposing physically-infeasible actions. Finally, we provide a detailed error analysis of LLMs, and demonstrate the potential of enhancing their problem-solving ability with novel prompting techniques such as iterative step-wise reflection and divergent-convergent thinking.This work (1) introduces a fresh arena for intelligent agents focusing on intricate aspects of physical reasoning, planning, and unconventional thinking, which supplements the existing spectrum of machine intelligence; and (2) provides insight into the constrained problem-solving capabilities of both humans and AI.

pdf abs
Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People
Dun-Ming Huang | Pol Van Rijn | Ilia Sucholutsky | Raja Marjieh | Nori Jacoby
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Conversational tones — the manners and attitudes in which speakers communicate — are essential to effective communication. As Large Language Models (LLMs) become increasingly popular, it is necessary to characterize the divergences in their conversational tones relative to humans. Prior research relied on pre-existing taxonomies or text corpora, which suffer from experimenter bias and may not be representative of real-world distributions. Inspired by methods from cognitive science, we propose an iterative method for simultaneously eliciting conversational tones and sentences, where participants alternate between two tasks: (1) one participant identifies the tone of a given sentence and (2) a different participant generates a sentence based on that tone. We run 50 iterations of this process with both human participants and GPT-4 and obtain a dataset of sentences and frequent conversational tones. In an additional experiment, humans and GPT-4 annotated all sentences with all tones. With data from 1,339 participants, 33,370 human judgments, and 29,900 GPT-4 queries, we show how our approach can be used to create an interpretable geometric representation of relations between tones in humans and GPT-4. This work showcases how combining ideas from machine learning and cognitive science can address challenges in human-computer interactions.

Co-authors

Yejin Choi 1

Thomas L. Griffiths 1

Venues

naacl1
acl1