2008
pdf
abs
LC-STAR II: Starring more Lexica
Ute Ziegenhain
|
Hanne Fersoe
|
Henk van den Heuvel
|
Asuncion Moreno
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
LC-STAR II is a follow-up project of the EU funded project LC-STAR (Lexica and Corpora for Speech-to-Speech Translation Components, IST-2001-32216). LC-STAR II develops large lexica containing information for speech processing in ten languages targeting especially automatic speech recognition and text to speech synthesis but also other applications like speech-to-speech translation and tagging. The project follows by large the specifications developed within the scope of LC-STAR covering thirteen languages: Catalan, Finnish, German, Greek, Hebrew, Italian, Mandarin Chinese, Russian, Turkish, Slovenian, Spanish, Standard Arabic and US-English. The ten new LC-STAR II languages are: Brazilian-Portuguese, Cantonese, Czech, English-UK, French, Hindi, Polish, Portuguese, Slovak, and Urdu. The project started in 2006 with a lifetime of two years. The project is funded by a consortium, which includes Microsoft (USA), Nokia (Finland), NSC (Israel), Siemens (Germany) and Harmann/Becker (Germany). The project is coordinated by UPC (Spain) and validation is performed by SPEX (The Netherlands), and CST (Denmark). The developed language resources will be shared among partners. This paper presents a summary of the creation of word lists and lexica and an overview of adaptations of the specifications and conceptual representation model from LC-STAR to the new languages. The validation procedure will be presented too.
2006
pdf
abs
TC-STAR:Specifications of Language Resources and Evaluation for Speech Synthesis
A. Bonafonte
|
H. Höge
|
I. Kiss
|
A. Moreno
|
U. Ziegenhain
|
H. van den Heuvel
|
H.-U. Hain
|
X. S. Wang
|
M. N. Garcia
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
In the framework of the EU funded project TC-STAR (Technology and Corpora for Speech to Speech Translation),research on TTS aims on providing a synthesized voice sounding like the source speaker speaking the target language. To progress in this direction, research is focused on naturalness, intelligibility, expressivity and voice conversion both, in the TC-STAR framework. For this purpose, specifications on large, high quality TTS databases have been developed and the data have been recorded for UK English, Spanish and Mandarin. The development of speech technology in TC-STAR is evaluation driven. Assessment of speech synthesis is needed to determine how well a system or technique performs in comparison to previous versions as well as other approaches (systems & methods). Apart from testing the whole system, all components of the system will be evaluated separately. This approach grants better assesment of each component as well as identification of the best techniques in the different speech synthesisprocesses.This paper describes the specifications of Language Resources for speech synthesis and the specifications for evaluation of speech synthesis activities.
2004
pdf
abs
Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes
Hanne Fersøe
|
Elviira Hartikainen
|
Henk van den Heuvel
|
Giulio Maltese
|
Asuncíon Moreno
|
Shaunie Shammass
|
Ute Ziegenhain
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
This paper presents specifications and requirements for creation and validation of large lexica that are needed in automatic Speech Recognition (ASR), Text-to-Speech (TTS) and statistical Speech-to-Speech Translation (SST) systems. The prepared language resources are created and validated within the scope of the EU-project LC-STAR (Lexica and Corpora for Speech-to-Speech Translation Components) during years 2002-2005. Large lexica consisting of phonetic, suprasegmental and morpho-syntactic content will be provided with well-documented specifications for 13 languages. A short summary of the LC-STAR project itself is presented. Overview about the specification for the corpora collection and word extraction as well as the specification and format of the lexica are presented. Particular attention is paid to the validation of the produced lexica and the lessons learnt during pre-validation. The created and validated language resources will be available via ELRA/ELDA.
2000
pdf
PLEDIT - A New Efficient Tool for Management of Multilingual Pronunciation Lexica and Batchlists
Damjan Vlaj
|
Janez Kaiser
|
Ralph Wilhelm
|
Ute Ziegenhain
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)