How Useful is Context, Actually? Comparing LLMs and Humans on Discourse Marker Prediction

Emily Sadlier-Brown, Millie Lou, Miikka Silfverberg, Carla Kam


Abstract
This paper investigates the adverbial discourse particle actually. We compare LLM and human performance on cloze tests involving actually on examples sourced from the Providence Corpus of speech around children. We explore the impact of utterance context on cloze test performance. We find that context is always helpful, though the extent to which additional context is helpful, and what relative placement of context (i.e. before or after the masked word) is most helpful differs for individual models and humans. The best-performing LLM, GPT-4, narrowly outperforms humans. In an additional experiment, we explore cloze performance on synthetic LLM-generated examples, and find that several models vastly outperform humans.
Anthology ID:
2024.cmcl-1.20
Volume:
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Tatsuki Kuribayashi, Giulia Rambelli, Ece Takmaz, Philipp Wicke, Yohei Oseki
Venues:
CMCL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
231–241
Language:
URL:
https://aclanthology.org/2024.cmcl-1.20
DOI:
10.18653/v1/2024.cmcl-1.20
Bibkey:
Cite (ACL):
Emily Sadlier-Brown, Millie Lou, Miikka Silfverberg, and Carla Kam. 2024. How Useful is Context, Actually? Comparing LLMs and Humans on Discourse Marker Prediction. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pages 231–241, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
How Useful is Context, Actually? Comparing LLMs and Humans on Discourse Marker Prediction (Sadlier-Brown et al., CMCL-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/autopr/2024.cmcl-1.20.pdf