Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes
Hanne Fersøe, Elviira Hartikainen, Henk van den Heuvel, Giulio Maltese, Asuncíon Moreno, Shaunie Shammass, Ute Ziegenhain
Abstract
This paper presents specifications and requirements for creation and validation of large lexica that are needed in automatic Speech Recognition (ASR), Text-to-Speech (TTS) and statistical Speech-to-Speech Translation (SST) systems. The prepared language resources are created and validated within the scope of the EU-project LC-STAR (Lexica and Corpora for Speech-to-Speech Translation Components) during years 2002-2005. Large lexica consisting of phonetic, suprasegmental and morpho-syntactic content will be provided with well-documented specifications for 13 languages. A short summary of the LC-STAR project itself is presented. Overview about the specification for the corpora collection and word extraction as well as the specification and format of the lexica are presented. Particular attention is paid to the validation of the produced lexica and the lessons learnt during pre-validation. The created and validated language resources will be available via ELRA/ELDA.- Anthology ID:
- L04-1268
- Volume:
- Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
- Month:
- May
- Year:
- 2004
- Address:
- Lisbon, Portugal
- Editors:
- Maria Teresa Lino, Maria Francisca Xavier, Fátima Ferreira, Rute Costa, Raquel Silva
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2004/pdf/452.pdf
- DOI:
- Cite (ACL):
- Hanne Fersøe, Elviira Hartikainen, Henk van den Heuvel, Giulio Maltese, Asuncíon Moreno, Shaunie Shammass, and Ute Ziegenhain. 2004. Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), Lisbon, Portugal. European Language Resources Association (ELRA).
- Cite (Informal):
- Creation and Validation of Large Lexica for Speech-to-Speech Translation Purposes (Fersøe et al., LREC 2004)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2004/pdf/452.pdf