Abstract
The growing capabilities of large language models (LLMs) have inspired recent efforts to integrate LLM-generated dialogue into video games. However, evaluation remains a major challenge: how do we assess the player experience in a commercial game augmented with LLM-generated dialogue? To explore this question, we introduce a dynamic evaluation framework for the dialogue management systems that govern the task-oriented dialogue often found in roleplaying video games. We first extract dialogue from the widely-acclaimed role-playing game *Disco Elysium: The Final Cut*, which contains 1.1M words of dialogue spread across a complex graph of utterances where node reachability depends on game state (e.g., whether a certain item is held). Using this dataset, we have GPT-4 perform *dialogue infilling* to generate grounded utterances based on game state represented via code. In a statistically robust study of 28 players recruited from the r/DiscoyElysium subreddit, the LLM outputs are evaluated against the game designers’ writing via both preference judgments and free-form feedback using a web interface that recreates the game’s core conversation functionality. Overall, the game designers’ prose is significantly preferred to GPT-4 generations, with participants citing reasons such as improved logical flow and grounding with the game state. To spur more principled future research in this area, we release our web interface and tools to enable researchers to build upon our work. https://pl.aiwright.dev- Anthology ID:
- 2023.findings-emnlp.151
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2295–2311
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.151
- DOI:
- 10.18653/v1/2023.findings-emnlp.151
- Cite (ACL):
- Nader Akoury, Qian Yang, and Mohit Iyyer. 2023. A Framework for Exploring Player Perceptions of LLM-Generated Dialogue in Commercial Video Games. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 2295–2311, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- A Framework for Exploring Player Perceptions of LLM-Generated Dialogue in Commercial Video Games (Akoury et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2023.findings-emnlp.151.pdf