Valéria Santos

Also published as: Valeria Santos


2026

Training Large Language Models (LLMs) relies predominantly on written, curated corpora, which may limit their reliability on spontaneous speech. Oral language exhibits real-time planning markers — filled pauses, repetitions, false starts, and vowel lengthenings — that modulate epistemic commitment. This pilot study investigates how such disfluencies affect the alignment between LLM confidence and a discourse-pragmatic uncertainty proxy in a Portuguese model (Llama-3.1-8B-Instruct). Using a benchmark of 344 turns from the Roda Viva corpus, we contrast faithful Conversation Analysis transcriptions with sanitized versions and combine binned divergence metrics (ECE, OE) with rank correlation and multivariate regression analyses. We find that model confidence is overwhelmingly driven by a surface feature — turn length (${\beta_{\text{std}}} = +14.47, p 0.001$) — rather than by pragmatic markers of uncertainty (${\beta_{\text{oral}}} = -3.09, {\beta_{\text{hedges}}} = -0.97$, both non-significant; $R2 = 0.29$). After controlling for length, residual effects of disfluency markers align in the human-expected direction but are dwarfed by length bias. We argue that this surface-feature dominance subsumes the pragmatic blindness phenomenon and explains the substantial divergence observed via ECE (41.95) and OE (4.29) between faithful and sanitized conditions.

2024