David Tavarez
2016
A Singing Voice Database in Basque for Statistical Singing Synthesis of Bertsolaritza
Xabier Sarasola
|
Eva Navas
|
David Tavarez
|
Daniel Erro
|
Ibon Saratxaga
|
Inma Hernaez
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This paper describes the characteristics and structure of a Basque singing voice database of bertsolaritza. Bertsolaritza is a popular singing style from Basque Country sung exclusively in Basque that is improvised and a capella. The database is designed to be used in statistical singing voice synthesis for bertsolaritza style. Starting from the recordings and transcriptions of numerous singers, diarization and phoneme alignment experiments have been made to extract the singing voice from the recordings and create phoneme alignments. This labelling processes have been performed applying standard speech processing techniques and the results prove that these techniques can be used in this specific singing style.
2014
New bilingual speech databases for audio diarization
David Tavarez
|
Eva Navas
|
Daniel Erro
|
Ibon Saratxaga
|
Inma Hernaez
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper describes the process of collecting and recording two new bilingual speech databases in Spanish and Basque. They are designed primarily for speaker diarization in two different application domains: broadcast news audio and recorded meetings. First, both databases have been manually segmented. Next, several diarization experiments have been carried out in order to evaluate them. Our baseline speaker diarization system has been applied to both databases with around 30% of DER for broadcast news audio and 40% of DER for recorded meetings. Also, the behavior of the system when different languages are used by the same speaker has been tested.
2012
Strategies to Improve a Speaker Diarisation Tool
David Tavarez
|
Eva Navas
|
Daniel Erro
|
Ibon Saratxaga
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper describes the different strategies used to improve the results obtained by an off-line speaker diarisation tool with the Albayzin 2010 diarisation database. The errors made by the system have been analyzed and different strategies have been proposed to reduce each kind of error. Very short segments incorrectly labelled and different appearances of one speaker labelled with different identifiers are the most common errors. A post-processing module that refines the segmentation by retraining the GMM models of the speakers involved has been built to cope with these errors. This post-processing module has been tuned with the training dataset and improves the result of the diarisation system by 16.4% in the test dataset.