Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Human-LLM Dialogue

Jonathan Ivey; Shivani Kumar; Jiayu Liu; Hua Shen; Sushrita Rakshit; Rohan Raju; Haotian Zhang; Aparna Ananthasubramaniam; Junghwan Kim; Bowen Yi; Dustin Wright; Abraham Israeli; Anders Giovanni Møller; Lechen Zhang; David Jurgens

Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Human-LLM Dialogue

Jonathan Ivey, Shivani Kumar, Jiayu Liu, Hua Shen, Sushrita Rakshit, Rohan Raju, Haotian Zhang, Aparna Ananthasubramaniam, Junghwan Kim, Bowen Yi, Dustin Wright, Abraham Israeli, Anders Giovanni M{\o}ller, Lechen Zhang, David Jurgens

Abstract

Building datasets for dialogue tasks is expensive and time-consuming, requiring recruitment, training, and data collection from study participants. In response, much recent work has sought to use large language models (LLMs) to simulate both human-human and human-LLM interactions, as they have been shown to generate convincingly human-like text in many settings. However, how well do LLM-based simulations reflect real human dialogue? In this work, we answer this question by generating a large-scale dataset of 100,000 paired LLM-LLM and human-LLM dialogues from the WildChat dataset and quantifying how well the LLM simulations align with their human counterparts. Overall, we find relatively low alignment between simulations and human interactions, with systematic differences in multiple textual properties, including style and conversational dynamics. Further, we find that models perform similarly in simulating English, Chinese, and Russian dialogues. Our results also suggest that LLMs only simulate a narrow range of the overall distribution of human dialogue, as they perform better on the subset of humans who write similarly to the LLM’s own style.

Anthology ID:: 2026.findings-acl.2060
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 41395–41432
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2060/
DOI:
Bibkey:
Cite (ACL):: Jonathan Ivey, Shivani Kumar, Jiayu Liu, Hua Shen, Sushrita Rakshit, Rohan Raju, Haotian Zhang, Aparna Ananthasubramaniam, Junghwan Kim, Bowen Yi, Dustin Wright, Abraham Israeli, Anders Giovanni M{\o}ller, Lechen Zhang, and David Jurgens. 2026. Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Human-LLM Dialogue. In Findings of the Association for Computational Linguistics: ACL 2026, pages 41395–41432, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Human-LLM Dialogue (Ivey et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2060.pdf
Checklist:: 2026.findings-acl.2060.checklist.pdf

PDF Cite Search Checklist Fix data