Medispeech: A French Reading and Spontaneous Speech Corpus for Sleepiness Estimation

Colleen Beaumard, Vincent P. Martin, Charles Brazier, Julien Coelho, Jean-Luc Rouas, Pierre Philip


Abstract
Excessive Daytime Sleepiness (EDS) is associated with several diseases and therefore negatively affects the daily life of impacted people. Its diagnosis and follow-up are difficult because they require testing at the hospital for one full day. Monitoring patients regularly in ecological conditions may be done through speech analysis. Although several corpora containing speech from sleepy subjects exist, they do not suit ecological requirements regarding either the device used for recording or the speech elicitation tasks. In this paper, we introduce the Medispeech corpus containing reading, daily-life semi-spontaneous, and medically-oriented spontaneous tasks. Fifty-nine French subjects were recorded with both a professional-quality microphone and a smartphone using a dedicated application, resulting in 1,729 recordings for a total duration of 21 hours. Their EDS diagnosis was assessed by both a physiological objective measurement (mean sleep latency measured during a clinical test) and a subjective questionnaire (Karolinska Sleepiness Scale). Phenotyping of subjects is assured by collecting socio-demographic and medical data related to diverse dimensions of sleepiness, comorbidities, and addictions. Finally, we analyse the validity of our data collection protocol by measuring the effective duration of speech (after discarding pauses) and assessing its links with the collected subjects’ characteristics.
Anthology ID:
2026.lrec-main.452
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
5737–5748
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.452/
DOI:
Bibkey:
Cite (ACL):
Colleen Beaumard, Vincent P. Martin, Charles Brazier, Julien Coelho, Jean-Luc Rouas, and Pierre Philip. 2026. Medispeech: A French Reading and Spontaneous Speech Corpus for Sleepiness Estimation. International Conference on Language Resources and Evaluation, main:5737–5748.
Cite (Informal):
Medispeech: A French Reading and Spontaneous Speech Corpus for Sleepiness Estimation (Beaumard et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.452.pdf