Abstract
A new type of language resource called 'BAStat' has been released by the Bavarian Archive for Speech Signals at Ludwig Maximilians Universitaet, Munich. In contrast to primary resources like speech and text corpora BAStat comprises statistical estimates based on a number of primary spoken language resources: first and second order occurrence probability of phones, syllables and words, duration statistics, probabilities of pronunciation variants of words and probabilities of context information. Unlike other statistical speech resources BAStat is based solely on recordings of conversational German and therefore models spoken language not text. The resource consists of a bundle of 7-bit ASCII tables and matrices to maximize inter-operability between different operation systems and can be downloaded for free from the BAS web-site. This contribution gives a detailed description about the empirical basis, the contained data types, the format of the resulting statistical data, some interesting interpretations of grand figures and a brief comparison to the text-based statistical resource CELEX.- Anthology ID:
- L10-1191
- Volume:
- Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
- Month:
- May
- Year:
- 2010
- Address:
- Valletta, Malta
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2010/pdf/277_Paper.pdf
- DOI:
- Cite (ACL):
- Florian Schiel. 2010. BAStat : New Statistical Resources at the Bavarian Archive for Speech Signals. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
- Cite (Informal):
- BAStat : New Statistical Resources at the Bavarian Archive for Speech Signals (Schiel, LREC 2010)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2010/pdf/277_Paper.pdf