Ioana Vasilescu

Also published as: I. Vasilescu


2020

pdf bib
Lénition et fortition des occlusives en coda finale dans deux langues romanes : le français et le roumain (Lenition and fortition of word-final stops in two Romance languages: French and Romanian)
Mathilde Hutin | Adèle Jatteau | Ioana Vasilescu | Lori Lamel | Martine Adda-Decker
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 1 : Journées d'Études sur la Parole

L’exploration automatisée de grands corpus permet d’analyser plus finement la relation entre motifs de variation phonétique synchronique et changements diachroniques : les erreurs dans les transcriptions automatiques sont riches d’enseignements sur la variation contextuelle en parole continue et sur les possibles mutations systémiques sur le point d’apparaître. Dès lors, il est intéressant de se pencher sur des phénomènes phonologiques largement attestés dans les langues en diachronie comme en synchronie pour établir leur émergence ou non dans des langues qui n’y sont pas encore sujettes. La présente étude propose donc d’utiliser l’alignement forcé avec variantes de prononciation pour observer les alternances de voisement en coda finale de mot dans deux langues romanes : le français et le roumain. Il sera mis en évidence, notamment, que voisement et dévoisement non-canoniques des codas françaises comme roumaines ne sont pas le fruit du hasard mais bien des instances de dévoisement final et d’assimilation régressive de trait laryngal, qu’il s’agisse de voisement ou de non-voisement.

pdf bib
Lenition and Fortition of Stop Codas in Romanian
Mathilde Hutin | Oana Niculescu | Ioana Vasilescu | Lori Lamel | Martine Adda-Decker
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

The present paper aims at providing a first study of lenition- and fortition-type phenomena in coda position in Romanian, a language that can be considered as less-resourced. Our data show that there are two contexts for devoicing in Romanian: before a voiceless obstruent, which means that there is regressive voicelessness assimilation in the language, and before pause, which means that there is a tendency towards final devoicing proper. The data also show that non-canonical voicing is an instance of voicing assimilation, as it is observed mainly before voiced consonants (voiced obstruents and sonorants alike). Two conclusions can be drawn from our analyses. First, from a phonetic point of view, the two devoicing phenomena exhibit the same behavior regarding place of articulation of the coda, while voicing assimilation displays the reverse tendency. In particular, alveolars, which tend to devoice the most, also voice the least. Second, the two assimilation processes have similarities that could distinguish them from final devoicing as such. Final devoicing seems to be sensitive to speech style and gender of the speaker, while assimilation processes do not. This may indicate that the two kinds of processes are phonologized at two different degrees in the language, assimilation being more accepted and generalized than final devoicing.

2016

pdf bib
Réalisation phonétique et contraste phonologique marginal : une étude automatique des voyelles du roumain (Phonetic realization and marginal phonemic contrast : an automatic study of the Romanian vowels)
Ioana Vasilescu | Margaret Renwick | Camille Dutrey | Lori Lamel | Biana Vieru
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP

Cet article est dédié à l’analyse acoustique des voyelles du roumain : des productions en parole continue sont comparées à des prononciations “de laboratoire”. Les objectifs sont : (1) décrire les traits acoustiques des voyelles en fonction du style de parole ; (2) estimer la relation entre traits acoustiques et contrastes phonémiques de la langue ; (3) estimer dans quelle mesure l’étude de l’oral apporte des éclairages au sujet des attributs phonémiques des voyelles centrales [2] et [1], dont le statut (phonèmes vs allophones) est controversé. Nous montrons que les traits acoustiques sont comparables pour la parole journalistique vs contrôlée pour l’ensemble de l’inventaire sauf [2] et [1]. Dans la parole contrôlée [2] et [1] sont distinctes, mais confondues en faveur du timbre [2] à l’oral. La confusion de timbres n’est pas source d’inintelligibilité car [2] et [1] sont en distribution quasicomplémentaire. Ce résultat apporte des éclairages sur la question du contraste phonémique graduel et marginal (Goldsmith, 1995; Scobbie & Stuart-Smith, 2008; Hall, 2013).

2014

pdf bib
Morpho-Syntactic Study of Errors from Speech Recognition System
Maria Goryainova | Cyril Grouin | Sophie Rosset | Ioana Vasilescu
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The study provides an original standpoint of the speech transcription errors by focusing on the morpho-syntactic features of the erroneous chunks and of the surrounding left and right context. The typology concerns the forms, the lemmas and the POS involved in erroneous chunks, and in the surrounding contexts. Comparison with error free contexts are also provided. The study is conducted on French. Morpho-syntactic analysis underlines that three main classes are particularly represented in the erroneous chunks: (i) grammatical words (to, of, the), (ii) auxiliary verbs (has, is), and (iii) modal verbs (should, must). Such items are widely encountered in the ASR outputs as frequent candidates to transcription errors. The analysis of the context points out that some left 3-grams contexts (e.g., repetitions, that is disfluencies, bracketing formulas such as “c’est”, etc.) may be better predictors than others. Finally, the surface analysis conducted through a Levensthein distance analysis, highlighted that the most common distance is of 2 characters and mainly involves differences between inflected forms of a unique item.

pdf bib
Human annotation of ASR error regions: Is “gravity” a sharable concept for human annotators?
Daniel Luzzati | Cyril Grouin | Ioana Vasilescu | Martine Adda-Decker | Eric Bilinski | Nathalie Camelin | Juliette Kahn | Carole Lailler | Lori Lamel | Sophie Rosset
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper is concerned with human assessments of the severity of errors in ASR outputs. We did not design any guidelines so that each annotator involved in the study could consider the “seriousness” of an ASR error using their own scientific background. Eight human annotators were involved in an annotation task on three distinct corpora, one of the corpora being annotated twice, hiding this annotation in duplicate to the annotators. None of the computed results (inter-annotator agreement, edit distance, majority annotation) allow any strong correlation between the considered criteria and the level of seriousness to be shown, which underlines the difficulty for a human to determine whether a ASR error is serious or not.

2012

pdf bib
Quel est l’apport de la détection d’entités nommées pour l’extraction d’information en domaine restreint ? (What is the contribution of named entities detection for information extraction in restricted domain ?) [in French]
Camille Dutrey | Chloé Clavel | Sophie Rosset | Ioana Vasilescu | Martine Adda-Decker
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib
Cross-lingual studies of ASR errors: paradigms for perceptual evaluations
Ioana Vasilescu | Martine Adda-Decker | Lori Lamel
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

It is well-known that human listeners significantly outperform machines when it comes to transcribing speech. This paper presents a progress report of the joint research in the automatic vs human speech transcription and of the perceptual experiments developed at LIMSI that aims to increase our understanding of automatic speech recognition errors. Two paradigms are described here in which human listeners are asked to transcribe speech segments containing words that are frequently misrecognized by the system. In particular, we sought to gain information about the impact of increased context to help humans disambiguate problematic lexical items, typically homophone or near-homophone words. The long-term aim of this research is to improve the modeling of ambiguous contexts so as to reduce automatic transcription errors.

2010

pdf bib
On the Role of Discourse Markers in Interactive Spoken Question Answering Systems
Ioana Vasilescu | Sophie Rosset | Martine Adda-Decker
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents a preliminary analysis of the role of some discourse markers and the vocalic hesitation ""euh"" in a corpus of spoken human utterances collected with the Ritel system, an open domain and spoken dialog system. The frequency and contextual combinatory of classical discourse markers and of the vocalic hesitation have been studied. This analysis pointed out some specificity in terms of combinatory of the analyzed items. The classical discourse markers seem to help initiating larger discursive blocks both at initial and medial positions of the on-going turns. The vocalic hesitation stand also for marking the user's embarrassments and wish to close the dialog.

2008

pdf bib
Speech Errors on Frequently Observed Homophones in French: Perceptual Evaluation vs Automatic Classification
Rena Nemoto | Ioana Vasilescu | Martine Adda-Decker
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The present contribution aims at increasing our understanding of automatic speech recognition (ASR) errors involving frequent homophone or almost homophone words by confronting them to perceptual results. The long-term aim is to improve acoustic modelling of these items to reduce automatic transcription errors. A first question of interest addressed in this paper is whether homophone words such as “et” (and); and “est” (to be), for which ASR systems rely on language model weights, can be discriminated in a perceptual transcription test with similar n-gram constraints. A second question concerns the acoustic separability of the two homophone words using appropriate acoustic and prosodic attributes. The perceptual test reveals that even though automatic and perceptual errors correlate positively, human listeners deal with local ambiguity more efficiently than the ASR system in conditions which attempt to approximate the information available for decision for a 4-gram language model. The corresponding acoustic analysis shows that the two homophone words may be distinguished thanks to some relevant acoustic and prosodic attributes. A first experiment in automatic classification of the two words using data mining techniques highlights the role of the prosodic (duration and voicing) and contextual information (pauses co-occurrence) in distinguishing the two words. Current results, even though preliminary, suggests that new levels of information, so far unexplored in pronunciations’ modelling for ASR, may be considered in order to efficiently factorize the word variants observed in speech and to improve the automatic speech transcription.

2006

pdf bib
Fear-type emotions of the SAFE Corpus: annotation issues
Chloé Clavel | Ioana Vasilescu | Laurence Devillers | Thibaut Ehrette | Gaël Richard
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The present research focuses on annotation issues in the context of the acoustic detection of fear-type emotions for surveillance applications. The emotional speech material used for this study comes from the previously collected SAFE Database (Situation Analysis in a Fictional and Emotional Database) which consists of audio-visual sequences extracted from movie fictions. A generic annotation scheme was developed to annotate the various emotional manifestations contained in the corpus. The annotation was carried out by two labellers and the two annotations strategies are confronted. It emerges that the borderline between emotion and neutral vary according to the labeller. An acoustic validation by a third labeller allows at analysing the two strategies. Two human strategies are then observed: a first one, context-oriented which mixes audio and contextual (video) information in emotion categorization; and a second one, based mainly on audio information. The k-means clustering confirms the role of audio cues in human annotation strategies. It particularly helps in evaluating those strategies from the point of view of a detection system based on audio cues.

2004

pdf bib
Reliability of Lexical and Prosodic Cues in Two Real-life Spoken Dialog Corpora
L. Devillers | I. Vasilescu
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)