How Much Data for Stable Formant Values? Pipeline for Convergence Detection Based on Read Speech

Kayla Sward, Johan Sjons, Axel G. Ekstrom


Abstract
This study investigates the stability and convergence of vowel formants (F1, F2, F3) in read speech through an extensive corpus of audiobook recordings. While most formant studies rely on brief, isolated utterances recorded in laboratory settings, this analysis draws on 3,384 chapters (about 942 hours) of continuous, stylistically varied speech from publicly available audiobooks. The data was processed using an automated pipeline that comprised transcription, phoneme alignment, and formant extraction. Several statistical techniques – First Token Within (FTW), Cumulative Sum (CUSUM), Two-Sample t-Test, Confidence Interval (CI) Shrinkage, Piecewise Linear Fitting (PWLF), and Binary Segmentation (BinSeg) – were compared for their effectiveness in identifying stabilization points. Findings indicate that formant means generally stabilize within 60 to 230 vowel tokens per phoneme, dependent on vowel type and speaker gender. Of the methods that were evaluated, CUSUM yielded the most consistent and informative results. The results provide practical guidelines for determining the quantity of non-laboratory speech required to obtain reliable vowel formant averages.
Anthology ID:
2026.lrec-main.470
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
5916–5925
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.470/
DOI:
Bibkey:
Cite (ACL):
Kayla Sward, Johan Sjons, and Axel G. Ekstrom. 2026. How Much Data for Stable Formant Values? Pipeline for Convergence Detection Based on Read Speech. International Conference on Language Resources and Evaluation, main:5916–5925.
Cite (Informal):
How Much Data for Stable Formant Values? Pipeline for Convergence Detection Based on Read Speech (Sward et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.470.pdf