Fine-tuned speech representations track spoken language convergence to adult models in infants and children who are deaf/hard-of-hearing

Landon Choy, Ali Sartaz Khan, Sonia Patrizi, Daisy S. Ye, Julianna Gross, Margaret Cychosz


Abstract
Language development is characterized by a gradual convergence of children’s speech toward adult patterns. Measuring this process has traditionally required detailed transcription and language-specific expertise, limiting scalability across languages and populations. Here, we use fine-tuned speech embeddings to capture this convergence directly from the acoustic signal in longform, child-centered recordings, taken as children go about their daily lives. Using BabyHuBERT, we extracted embeddings from vocalizations of children who are deaf/hard-of-hearing and their female adult caregivers (>925 hrs. observation). Embedding distance between children and caregivers decreased with hearing age, controlling for pitch, indicating, as expected, that children’s speech patterns converge to caregivers over development. This single distance metric likewise related to multiple standardized measures of speech and language, from infancy through preschoolhood. These results suggest a path toward scalable, language-neutral assessment of spoken language development from children’s everyday lives.
Anthology ID:
2026.cdl-1.8
Volume:
Proceedings of the 1st Workshop on Computational Developmental Linguistics (CDL)
Month:
July
Year:
2026
Address:
Grand Hyatt Manchester San Diego, 1 Market Pl, San Diego, CA 92101
Editors:
Martin Ziqiao Ma, Emmy Liu, Jing Liu, Tyler A. Chang, Abdellah Fourtassi, Alex Warstadt, Michael Hahn, Weiwei Sun, Freda Shi
Venues:
CDL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
27–36
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.cdl-1.8/
DOI:
Bibkey:
Cite (ACL):
Landon Choy, Ali Sartaz Khan, Sonia Patrizi, Daisy S. Ye, Julianna Gross, and Margaret Cychosz. 2026. Fine-tuned speech representations track spoken language convergence to adult models in infants and children who are deaf/hard-of-hearing. In Proceedings of the 1st Workshop on Computational Developmental Linguistics (CDL), pages 27–36, Grand Hyatt Manchester San Diego, 1 Market Pl, San Diego, CA 92101. Association for Computational Linguistics.
Cite (Informal):
Fine-tuned speech representations track spoken language convergence to adult models in infants and children who are deaf/hard-of-hearing (Choy et al., CDL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.cdl-1.8.pdf