Long-Form Recordings to Study Children’s Language Input and Output in Under-Resourced Contexts

Joseph R. Coffey; Alejandrina Cristia

Long-Form Recordings to Study Children’s Language Input and Output in Under-Resourced Contexts

Abstract

A growing body of research suggests that young children’s early speech and language exposure is associated with later language development (including delays and diagnoses), school readiness, and academic performance. The last decade has seen increasing use of child-worn devices to collect long-form audio recordings by educators, economists, and developmental psychologists. The most commonly used system for analyzing this data is LENA, which was trained on North American English child-centered data and generates estimates of children’s speech-like vocalization counts, adult word counts, and child-adult turn counts. Recently, cheaper and open-source non-LENA alternatives with multilingual training have been proposed. Both kinds of systems have been employed in under-resourced, sometimes multilingual contexts, including Africa where access to printed or digital linguistic resources may be limited. In this paper, we describe each kind of system (LENA, non-LENA), provide information on audio data collected with them that is available for reuse, review evidence of the accuracy of extant automated analyses, and note potential strengths and shortcomings of their use in African communities.

Anthology ID:: 2024.rail-1.3
Volume:: Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Rooweither Mabuya, Muzi Matfunjwa, Mmasibidi Setaka, Menno van Zaanen
Venues:: RAIL | WS
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 20–31
Language:
URL:: https://preview.aclanthology.org/landing_page/2024.rail-1.3/
DOI:
Bibkey:
Cite (ACL):: Joseph R. Coffey and Alejandrina Cristia. 2024. Long-Form Recordings to Study Children’s Language Input and Output in Under-Resourced Contexts. In Proceedings of the Fifth Workshop on Resources for African Indigenous Languages @ LREC-COLING 2024, pages 20–31, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Long-Form Recordings to Study Children’s Language Input and Output in Under-Resourced Contexts (Coffey & Cristia, RAIL 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2024.rail-1.3.pdf

PDF Cite Search Fix data