Mario Corrales-Astorgano

2026

Deep Learning-Based Multi-Aspect Pronunciation Assessment for Individuals with Down Syndrome
David Fernández-García | César González-Ferreras | Valentín Cardeñoso-Payo | Mario Corrales-Astorgano
Proceedings of the Fifteenth Language Resources and Evaluation Conference

This paper explores the use of an annotated speech corpus to assess multiple dimensions of speech quality—particularly phonetic, fluency and prosody—in individuals with Down syndrome, with the aim of informing the development of automated assessment tools. We conducted a series of experiments using the GOPT model, together with representations extracted from fine-tuning Wav2Vec models focused on phoneme classification. Model predictions were compared against expert annotations from a speech-language pathologist using Pearson correlation. Results demonstrate significant improvements over prior work, with correlations up to 0.49 in certain aspects, particularly for phonetic and fluency dimensions, while prosody remained more challenging to model. The study highlights the potential of Transformer-based architectures for atypical speech assessment and underscores the challenges inherent in assessing atypical speech, particularly due to variability linked to specific disfluency types.

2016

pdf bib abs

On the Use of a Serious Game for Recording a Speech Corpus of People with Intellectual Disabilities
Mario Corrales-Astorgano | David Escudero-Mancebo | Yurena Gutiérrez-González | Valle Flores-Lucas | César González-Ferreras | Valentín Cardeñoso-Payo
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes the recording of a speech corpus focused on prosody of people with intellectual disabilities. To do this, a video game is used with the aim of improving the user’s motivation. Moreover, the player’s profiles and the sentences recorded during the game sessions are described. With the purpose of identifying the main prosodic troubles of people with intellectual disabilities, some prosodic features are extracted from recordings, like fundamental frequency, energy and pauses. After that, a comparison is made between the recordings of people with intellectual disabilities and people without intellectual disabilities. This comparison shows that pauses are the best discriminative feature between these groups. To check this, a study has been done using machine learning techniques, with a classification rate superior to 80%.

Co-authors

Yurena Gutiérrez-González 1

Venues

LREC2

Fix author