Yesika Laplaza


2014

pdf
TexAFon 2.0: A text processing tool for the generation of expressive speech in TTS applications
Juan María Garrido | Yesika Laplaza | Benjamin Kolz | Miquel Cornudella
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents TexAfon 2.0, an improved version of the text processing tool TexAFon, specially oriented to the generation of synthetic speech with expressive content. TexAFon is a text processing module in Catalan and Spanish for TTS systems, which performs all the typical tasks needed for the generation of synthetic speech from text: sentence detection, pre-processing, phonetic transcription, syllabication, prosodic segmentation and stress prediction. These improvements include a new normalisation module for the standardisation on chat text in Spanish, a module for the detection of the expressed emotions in the input text, and a module for the automatic detection of the intended speech acts, which are briefly described in the paper. The results of the evaluations carried out for each module are also presented.

2012

pdf
The I3MEDIA speech database: a trilingual annotated corpus for the analysis and synthesis of emotional speech
Juan María Garrido | Yesika Laplaza | Montse Marquina | Andrea Pearman | José Gregorio Escalada | Miguel Ángel Rodríguez | Ana Armenta
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this article the I3Media corpus is presented, a trilingual (Catalan, English, Spanish) speech database of neutral and emotional material collected for analysis and synthesis purposes. The corpus is actually made up of six different subsets of material: a neutral subcorpus, containing emotionless utterances; a ‘dialog' subcorpus, containing typical call center utterances; an ‘emotional' corpus, a set of sentences representative of pure emotional states; a ‘football' subcorpus, including utterances imitating a football broadcasting situation; a ‘SMS' subcorpus, including readings of SMS texts; and a ‘paralinguistic elements' corpus, including recordings of interjections and paralinguistic sounds uttered in isolation. The corpus was read by professional speakers (male, in the case of Spanish and Catalan; female, in the case of the English corpus), carefully selected to meet criteria of language competence, voice quality and acting conditions. It is the result of a collaboration between the Speech Technology Group at Telefónica Investigación y Desarrollo (TID) and the Speech and Language Group at Barcelona Media Centre d'Innovació (BM), as part of the I3Media project.