When assessing second language proficiency (L2), evaluation of spontaneous speech performance is crucial. This paper presents a corpus of spontaneous L2 English speech, focusing on the speech performance of B1 and B2 proficiency speakers. Two hundred and sixty university students were recorded during a speaking task as part of a French national certificate in English. This task entailed a 10-minute role-play among 2 or 3 candidates, arguing about a controversial topic, in order to reach a negotiated compromise. Each student’s performance was evaluated by two experts, categorizing them into B2, B1 or below B1 speaking proficiency levels. Automatic diarization, transcription, and alignment at the word level were performed on the recorded conversations, in order to analyse lexical stress realisation in polysyllabic plain words of B1 and B2 proficiency students. Results showed that only 35.4% of the 6,350 targeted words had stress detected on the expected syllable, revealing a common stress shift to the final syllable. Besides a substantial inter-speaker variability (0% to 68.4%), B2 speakers demonstrated a slightly higher stress accuracy (36%) compared to B1 speakers (29.6%). Those with accurate stress placement utilized F0 and intensity to make syllable prominence, while speakers with lower accuracy tended to lengthen words on their last syllables, with minimal changes in other dimensions.
Cette étude a pour objectif de proposer une quantification de l’accent étranger se basant sur des mesures rythmiques. Nous avons utilisé le Corpus pour l’Étude du Français Contemporain, qui propose plus de 300 heures de parole aux profils de locuteurs et aux situations variés. Nous nous sommes concentrés sur 16 paramètres temporels estimés à partir des durées de voisement et de syllabes. Un mélange gaussien a été appris sur les données de 1 340 natifs du français, puis testé sur des extraits de 146 natifs tirés au hasard (NS), sur ceux des 37 non-natifs présents dans le corpus (NNS), ainsi que sur des enregistrements de 29 apprenants japonais de niveau A2 d’un autre corpus. La probabilité que les NNS aient une log-vraisemblance inférieure aux NS ne dépasse pas la tendance (p = 0, 067), mais celle pour les apprenants japonais est beaucoup plus significative (p < 0, 0001). L’étude de la répartition des paramètres entre les différents groupes met en avant l’importance du débit de parole et des durées de voisement. 1
This work aims to better understand the role of rhythm in foreign accent, and its modelling. We made a model of rhythm in French taking into account its variability, thanks to the Corpus pour l’Étude du Français Contemporain (CEFC), which contains up to 300 hours of speech of a wide variety of speaker profiles and situations. 16 parameters were computed, each of them being based on segment duration, such as voicing and intersyllabic timing. All the parameters are fully automatically detected from signal, without ASR or transcription. A gaussian mixture model was trained on 1,340 native speakers of French; any 30-second minimum speech may be computed to get the probability of its belonging to this model. We tested it with 146 test native speakers (NS), 37 non-native speakers (NNS) from the same corpus, and 29 non-native Japanese learners of French (JpNNS) from an independent corpus. The probability of NNS having inferior log-likelihood to NS was only a tendency (p=.067), maybe due to the heterogeneity of French proficiency of the speakers; but a much bigger probability was obtained for JpNNS (p<.0001), where all speakers were A2 level. Eta-squared test showed that most efficient parameters were intersyllabic mean duration and variation coefficient, along with speech rate for NNS; and speech rate and phonation ratio for JpNNS.