Mark Dourado


2025

Computational models of dialogue often struggle to capture the nuanced structures of spontaneous conversation - specifically in polyadic, real-world settings. We introduce a multilayered annotation protocol designed for the GaMMA corpus, a Danish dataset of four-person conversations recorded in both quiet and noisy environments. The protocol targets key interactional phenomena: Turn Construction Units, backchannels, floor transfer attempts, and repair sequences. Each annotation layer is grounded in Conversation Analysis while remaining machine-actionable, enabling alignment with multimodal data such as gaze and motion. We report inter-annotator agreement metrics across annotation tiers and discuss how the protocol supports both fine-grained interaction analysis and the training of context-aware dialogue models.