2016
pdf
abs
Investigating Cross-lingual Multi-level Adaptive Networks: The Importance of the Correlation of Source and Target Languages
Alexandros Lazaridis
|
Ivan Himawan
|
Petr Motlicek
|
Iosif Mporas
|
Philip N. Garner
Proceedings of the 13th International Conference on Spoken Language Translation
The multi-level adaptive networks (MLAN) technique is a cross-lingual adaptation framework where a bottleneck (BN) layer in a deep neural network (DNN) trained in a source language is used for producing BN features to be exploited in a second DNN in a target language. We investigate how the correlation (in the sense of phonetic similarity) of the source and target languages and the amount of data of the source language affect the efficiency of the MLAN schemes. We experiment with three different scenarios using, i) French, as a source language uncorrelated to the target language, ii) Ukrainian, as a source language correlated to the target one and finally iii) English as a source language uncorrelated to the target language using a relatively large amount of data in respect to the other two scenarios. In all cases Russian is used as target language. GLOBALPHONE data is used, except for English, where a mixture of LIBRISPEECH, TEDLIUM and AMIDA is available. The results have shown that both of these two factors are important for the MLAN schemes. Specifically, on the one hand, when a modest amount of data from the source language is used, the correlation of the source and target languages is very important. On the other hand, the correlation of the two languages seems to be less important when a relatively large amount of data, from the source language, is used. The best performance in word error rate (WER), was achieved when the English language was used as the source one in the multi-task MLAN scheme, achieving a relative improvement of 9.4% in respect to the baseline DNN model.
2010
pdf
abs
Vergina: A Modern Greek Speech Database for Speech Synthesis
Alexandros Lazaridis
|
Theodoros Kostoulas
|
Todor Ganchev
|
Iosif Mporas
|
Nikos Fakotakis
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
The present paper outlines the Vergina speech database, which was developed in support of research and development of corpus-based unit selection and statistical parametric speech synthesis systems for Modern Greek language. In the following, we describe the design, development and implementation of the recording campaign, as well as the annotation of the database. Specifically, a text corpus of approximately 5 million words, collected from newspaper articles, periodicals, and paragraphs of literature, was processed in order to select the utterances-sentences needed for producing the speech database and to achieve a reasonable phonetic coverage. The broad coverage and contents of the selected utterances-sentences of the database ― text corpus collected from different domains and writing styles ― makes this database appropriate for various application domains. The database, recorded in audio studio, consists of approximately 3,000 phonetically balanced Modern Greek utterances corresponding to approximately four hours of speech. Annotation of the Vergina speech database was performed using task-specific tools, which are based on a hidden Markov model (HMM) segmentation method, and then manual inspection and corrections were performed.
2008
pdf
abs
A Real-World Emotional Speech Corpus for Modern Greek
Theodoros Kostoulas
|
Todor Ganchev
|
Iosif Mporas
|
Nikos Fakotakis
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
The present paper deals with the design and the annotation of a Greek real-world emotional speech corpus. The speech data consist of recordings collected during the interaction of naïve users with a smart-home dialogue system. Annotation of the speech data with respect to the uttered command and emotional state was performed. Initial experimentations towards recognizing negative emotional states were performed and the experimental results indicate the range of difficulties when dealing with real-world data.