Clément François

2024

pdf abs
Daily auditory environments in French-speaking infants: A longitudinal dataset
Estelle Hervé | Clément François | Laurent Prevot
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

Babies’ daily auditory environment plays a crucial role in language development. Most previous research estimating the quantitative and qualitative aspects of early speech inputs has predominantly focused on English- and Spanish-speaking families. In addition, validation studies for daylong recordings’ analysis tools are scarce on French data sets.In this paper, we present a French corpus of daylong audio recordings longitudinally collected with the LENA (Language ENvironment Analysis) system from infants aged 3 to 24 months. We conduct a thorough exploration of this data set, which serves as a quality check for both the data and the analysis tools.We evaluate the reliability of LENA metrics by systematically comparing them with those obtained from the ChildProject set of tools and by checking the known dynamics of the metrics with age. These metrics are also used to replicate, on our data set, findings from (Warlaumont et al, 2014) about the increase of infants’ speech vocalizations and temporal contingencies between infants and caregivers with age.

2022

We present in this paper the first natural conversation corpus recorded with all modalities and neuro-physiological signals. 5 dyads (10 participants) have been recorded three times, during three sessions (30mns each) with 4 days interval. During each session, audio and video are captured as well as the neural signal (EEG with Emotiv-EPOC) and the electro-physiological one (with Empatica-E4). This resource original in several respects. Technically, it is the first one gathering all these types of data in a natural conversation situation. Moreover, the recording of the same dyads at different periods opens the door to new longitudinal investigations such as the evolution of interlocutors’ alignment during the time. The paper situates this new type of resources with in the literature, presents the experimental setup and describes different annotations enriching the corpus.

Co-authors