Timothy T. Rogers


2025

pdf bib
Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding
Yun-Shiuan Chuang | Sameer Narendran | Nikunj Harlalka | Alexander Cheung | Sizhe Gao | Siddharth Suresh | Junjie Hu | Timothy T. Rogers
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Guesstimation—the task of making approximate quantitative estimates about objects or events—is a common real-world skill, yet remains underexplored in large language model (LLM) research. We introduce three guesstimation datasets: MARBLES, FUTURE, and ELECPRED, spanning physical estimation (e.g., how many marbles fit in a cup) to abstract predictions (e.g., the 2024 U.S. presidential election). Inspired by the social science concept of Wisdom of Crowds (WOC)—where the median of multiple estimates improves accuracy—we propose WOC decoding for LLMs. We replicate WOC effects in human participants and find that LLMs exhibit similar benefits: median aggregation across sampled responses consistently improves accuracy over greedy, self-consistency decoding, and mean decoding. This suggests that LLMs encode a world model that supports approximate reasoning. Our results position guesstimation as a useful probe of LLM world knowledge and highlight WOC decoding as a strategy for enhancing LLM guesstimation performance on real-world tasks.

2024

pdf bib
Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks
Yun-Shiuan Chuang | Krirk Nirunwiroj | Zach Studdiford | Agam Goyal | Vincent V. Frigo | Sijia Yang | Dhavan V. Shah | Junjie Hu | Timothy T. Rogers
Findings of the Association for Computational Linguistics: EMNLP 2024

Creating human-like large language model (LLM) agents is crucial for faithful social simulation. Having LLMs role-play based on demographic information sometimes improves human likeness but often does not. This study assessed whether LLM alignment with human behavior can be improved by integrating information from empirically-derived human belief networks. Using data from a human survey, we estimated a belief network encompassing 64 topics loading on nine non-overlapping latent factors. We then seeded LLM-based agents with an opinion on one topic, and assessed the alignment of its expressed opinions on remaining test topics with corresponding human data. Role-playing based on demographic information alone did not align LLM and human opinions, but seeding the agent with a single belief greatly improved alignment for topics related in the belief network, and not for topics outside the network. These results suggest a novel path for human-LLM belief alignment in work seeking to simulate and understand patterns of belief distributions in society.