Daisy S. Ye
2026
Fine-tuned speech representations track spoken language convergence to adult models in infants and children who are deaf/hard-of-hearing
Landon Choy | Ali Sartaz Khan | Sonia Patrizi | Daisy S. Ye | Julianna Gross | Margaret Cychosz
Proceedings of the 1st Workshop on Computational Developmental Linguistics (CDL)
Landon Choy | Ali Sartaz Khan | Sonia Patrizi | Daisy S. Ye | Julianna Gross | Margaret Cychosz
Proceedings of the 1st Workshop on Computational Developmental Linguistics (CDL)
Language development is characterized by a gradual convergence of children’s speech toward adult patterns. Measuring this process has traditionally required detailed transcription and language-specific expertise, limiting scalability across languages and populations. Here, we use fine-tuned speech embeddings to capture this convergence directly from the acoustic signal in longform, child-centered recordings, taken as children go about their daily lives. Using BabyHuBERT, we extracted embeddings from vocalizations of children who are deaf/hard-of-hearing and their female adult caregivers (>925 hrs. observation). Embedding distance between children and caregivers decreased with hearing age, controlling for pitch, indicating, as expected, that children’s speech patterns converge to caregivers over development. This single distance metric likewise related to multiple standardized measures of speech and language, from infancy through preschoolhood. These results suggest a path toward scalable, language-neutral assessment of spoken language development from children’s everyday lives.