Eva Navas

Also published as: E. Navas


2016

pdf
A Singing Voice Database in Basque for Statistical Singing Synthesis of Bertsolaritza
Xabier Sarasola | Eva Navas | David Tavarez | Daniel Erro | Ibon Saratxaga | Inma Hernaez
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes the characteristics and structure of a Basque singing voice database of bertsolaritza. Bertsolaritza is a popular singing style from Basque Country sung exclusively in Basque that is improvised and a capella. The database is designed to be used in statistical singing voice synthesis for bertsolaritza style. Starting from the recordings and transcriptions of numerous singers, diarization and phoneme alignment experiments have been made to extract the singing voice from the recordings and create phoneme alignments. This labelling processes have been performed applying standard speech processing techniques and the results prove that these techniques can be used in this specific singing style.

2014

pdf
Basque Speecon-like and Basque SpeechDat MDB-600: speech databases for the development of ASR technology for Basque
Igor Odriozola | Inma Hernaez | María Inés Torres | Luis Javier Rodriguez-Fuentes | Mikel Penagarikano | Eva Navas
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper introduces two databases specifically designed for the development of ASR technology for the Basque language: the Basque Speecon-like database and the Basque SpeechDat MDB-600 database. The former was recorded in an office environment according to the Speecon specifications, whereas the later was recorded through mobile telephones according to the SpeechDat specifications. Both databases were created under an initiative that the Basque Government started in 2005, a program called ADITU, which aimed at developing speech technologies for Basque. The databases belong to the Basque Government. A comprehensive description of both databases is provided in this work, highlighting the differences with regard to their corresponding standard specifications. The paper also presents several initial experimental results for both databases with the purpose of validating their usefulness for the development of speech recognition technology. Several applications already developed with the Basque Speecon-like database are also described. Authors aim to make these databases widely known to the community as well, and foster their use by other groups.

pdf
New bilingual speech databases for audio diarization
David Tavarez | Eva Navas | Daniel Erro | Ibon Saratxaga | Inma Hernaez
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper describes the process of collecting and recording two new bilingual speech databases in Spanish and Basque. They are designed primarily for speaker diarization in two different application domains: broadcast news audio and recorded meetings. First, both databases have been manually segmented. Next, several diarization experiments have been carried out in order to evaluate them. Our baseline speaker diarization system has been applied to both databases with around 30% of DER for broadcast news audio and 40% of DER for recorded meetings. Also, the behavior of the system when different languages are used by the same speaker has been tested.

2012

pdf
Versatile Speech Databases for High Quality Synthesis for Basque
Iñaki Sainz | Daniel Erro | Eva Navas | Inma Hernáez | Jon Sanchez | Ibon Saratxaga | Igor Odriozola
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents three new speech databases for standard Basque. They are designed primarily for corpus-based synthesis but each database has its specific purpose: 1) AhoSyn: high quality speech synthesis (recorded also in Spanish), 2) AhoSpeakers: voice conversion and 3) AhoEmo3: emotional speech synthesis. The whole corpus design and the recording process are described with detail. Once the databases were collected all the data was automatically labelled and annotated. Then, an HMM-based TTS voice was built and subjectively evaluated. The results of the evaluation are pretty satisfactory: 3.70 MOS for Basque and 3.44 for Spanish. Therefore, the evaluation assesses the quality of this new speech resource and the validity of the automated processing presented.

pdf
Strategies to Improve a Speaker Diarisation Tool
David Tavarez | Eva Navas | Daniel Erro | Ibon Saratxaga
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes the different strategies used to improve the results obtained by an off-line speaker diarisation tool with the Albayzin 2010 diarisation database. The errors made by the system have been analyzed and different strategies have been proposed to reduce each kind of error. Very short segments incorrectly labelled and different appearances of one speaker labelled with different identifiers are the most common errors. A post-processing module that refines the segmentation by retraining the GMM models of the speakers involved has been built to cope with these errors. This post-processing module has been tuned with the training dataset and improves the result of the diarisation system by 16.4% in the test dataset.

pdf
Using an ASR database to design a pronunciation evaluation system in Basque
Igor Odriozola | Eva Navas | Inma Hernaez | Iñaki Sainz | Ibon Saratxaga | Jon Sánchez | Daniel Erro
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents a method to build CAPT systems for under resourced languages, as Basque, using a general purpose ASR speech database. More precisely, the proposed method consists in automatically determine the threshold of GOP (Goodness Of Pronunciation) scores, which have been used as pronunciation scores in phone-level. Two score distributions have been obtained for each phoneme corresponding to its correct and incorrect pronunciations. The distribution of the scores for erroneous pronunciation has been calculated inserting controlled errors in the dictionary, so that each changed phoneme has been randomly replaced by a phoneme from the same group. These groups have been obtained by means of a phonetic clustering performed using regression trees. After obtaining both distributions, the EER (Equal Error Rate) of each distribution pair has been calculated and used as a decision threshold for each phoneme. The results show that this method is useful when there is no database specifically designed for CAPT systems, although it is not as accurate as those specifically designed for this purpose.

pdf
BUCEADOR, a multi-language search engine for digital libraries
Jordi Adell | Antonio Bonafonte | Antonio Cardenal | Marta R. Costa-Jussà | José A. R. Fonollosa | Asunción Moreno | Eva Navas | Eduardo R. Banga
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents a web-based multimedia search engine built within the Buceador (www.buceador.org) research project. A proof-of-concept tool has been implemented which is able to retrieve information from a digital library made of multimedia documents in the 4 official languages in Spain (Spanish, Basque, Catalan and Galician). The retrieved documents are presented in the user language after translation and dubbing (the four previous languages + English). The paper presents the tool functionality, the architecture, the digital library and provide some information about the technology involved in the fields of automatic speech recognition, statistical machine translation, text-to-speech synthesis and information retrieval. Each technology has been adapted to the purposes of the presented tool as well as to interact with the rest of the technologies involved.

2010

pdf
TTS Evaluation Campaign with a Common Spanish Database
Iñaki Sainz | Eva Navas | Inma Hernáez | Antonio Bonafonte | Francisco Campillo
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes the first TTS evaluation campaign designed for Spanish. Seven research institutions took part in the evaluation campaign and developed a voice from a common speech database provided by the organisation. Each participating team had a period of seven weeks to generate a voice. Next, a set of sentences were released and each team had to synthesise them within a week period. Finally, some of the synthesised test audio files were subjectively evaluated via an online test according to the following criteria: similarity to the original voice, naturalness and intelligibility. Box-plots, Wilcoxon tests and WER have been generated in order to analyse the results. Two main conclusions can be drawn: On the one hand, there is considerable margin for improvement to reach the quality level of the natural voice. On the other hand, two systems get significantly better results than the rest: one is based on statistical parametric synthesis and the other one is a concatenative system that makes use of a sinusoidal model to modify both prosody and smooth spectral joints. Therefore, it seems that some kind of spectral control is needed when building voices with a medium size database for unrestricted domains.

pdf
AhoTransf: A Tool for Multiband Excitation Based Speech Analysis and Modification
Ibon Saratxaga | Inmaculada Hernáez | Eva Navas | Iñaki Sainz | Iker Luengo | Jon Sánchez | Igor Odriozola | Daniel Erro
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper we present AhoTransf, a tool that enables analysis, visualization, modification and synthesis of speech. AhoTransf integrates a speech signal analysis model with a graphical user interface to allow visualization and modification of the parameters of the model. The synthesis capability allows hearing the modified signal thus providing a quick way to understand the perceptual effect of the changes in the parameters of the model. The speech analysis/synthesis algorithm is based in the Multiband Excitation technique, but uses a novel phase information representation the Relative Phase Shift (RPS’s). With this representation, not only the amplitudes but also the phases of the harmonic components of the speech signal reveal their structured patterns in the visualization tool. AhoTransf is modularly conceived so that it can be used with different harmonic speech models.

pdf
Modified LTSE-VAD Algorithm for Applications Requiring Reduced Silence Frame Misclassification
Iker Luengo | Eva Navas | Igor Odriozola | Ibon Saratxaga | Inmaculada Hernaez | Iñaki Sainz | Daniel Erro
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The LTSE-VAD is one of the best known algorithms for voice activity detection. In this paper we present a modified version of this algorithm, that makes the VAD decision not taking into account account the estimated background noise level, but the signal to noise ratio (SNR). This makes the algorithm robust not only to noise level changes, but also to signal level changes. We compare the modified algorithm with the original one, and with three other standard VAD systems. The results show that the modified version gets the lowest silence misclassification rate, while maintaining a reasonably low speech misclassification rate. As a result, this algorithm is more suitable for identification tasks, such as speaker or emotion recognition, where silence misclassification can be very harmful. A series of automatic emotion identification experiments are also carried out, proving that the modified version of the algorithm helps increasing the correct emotion classification rate.

2008

pdf
Text Independent Speaker Identification in Multilingual Environments
Iker Luengo | Eva Navas | Iñaki Sainz | Ibon Saratxaga | Jon Sanchez | Igor Odriozola | Inma Hernaez
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Speaker identification and verification systems have a poor performance when model training is done in one language while the testing is done in another. This situation is not unusual in multilingual environments, where people should be able to access the system in any language he or she prefers in each moment, without noticing a performance drop. In this work we study the possibility of using features derived from prosodic parameters in order to reinforce the language robustness of these systems. First the features’ properties in terms of language and session variability are studied, predicting an increase in the language robustness when frame-wise intonation and energy values are combined with traditional MFCC features. The experimental results confirm that these features provide an improvement in the speaker recognition rates under language-mismatch conditions. The whole study is carried out in the Basque Country, a bilingual region in which Basque and Spanish languages co-exist.

pdf
Subjective Evaluation of an Emotional Speech Database for Basque
Iñaki Sainz | Ibon Saratxaga | Eva Navas | Inmaculada Hernáez | Jon Sanchez | Iker Luengo | Igor Odriozola
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes the evaluation process of an emotional speech database recorded for standard Basque, in order to determine its adequacy for the analysis of emotional models and its use in speech synthesis. The corpus consists of seven hundred semantically neutral sentences that were recorded for the Big Six emotions and neutral style, by two professional actors. The test results show that every emotion is readily recognized far above chance level for both speakers. Therefore the database is a valid linguistic resource for the research and development purposes it was designed for.

2006

pdf
Designing and Recording an Emotional Speech Database for Corpus Based Synthesis in Basque
Ibon Saratxaga | Eva Navas | Inmaculada Hernáez | Iker Aholab
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes an emotional speech database recorded for standard Basque. The database has been designed with the twofold purpose of being used for corpus based synthesis, and also of allowing the study of prosodic models for the emotions. The database is thus large, to get good corpus based synthesis quality and contains the same texts recorded in the six basic emotions plus the neutral style. The recordings were carried out by two professional dubbing actors, a man and a woman. The paper explains the whole creation process, beginning with the design stage, following with the corpus creation and the recording phases, and finishing with some learned lessons and hints.

2004

pdf
Designing and Recording an Audiovisual Database of Emotional Speech in Basque
Eva Navas | Amaia Castelruiz | Iker Luengo | Jon Sánchez | Inmaculada Hernáez
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf
BIZKAIFON: A sound archive of dialectal varieties of spoken Basque
I. Hernáez | E. Navas | J. Sánchez | I. Madariaga | I. Gaminde | X. Zalbide
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)