2008
pdf
abs
German Today: a really extensive Corpus of Spoken Standard German
Caren Brinckmann
|
Stefan Kleiner
|
Ralf Knöbl
|
Nina Berend
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
The research project German Today aims to determine the amount of regional variation in (near-)standard German spoken by young and older educated adults and to identify and locate regional features. To this end, we compile an areally extensive corpus of read and spontaneous German speech. Secondary school students and 50-to-60-year-old locals are recorded in 160 cities throughout the German speaking area of Europe. All participants read a number of short texts and a word list, name pictures, translate words and sentences from English, answer questions in a sociobiographic interview, and take part in a map task experiment. The resulting corpus comprises over 1,000 hours of speech, which is transcribed orthographically. Automatically derived broad phonetic transcriptions, selective manual narrow phonetic transcriptions, and variationalist annotations are added. Focussing on phonetic variation we aim to show to what extent national or regional standards exist in spoken German. Furthermore, the linguistic variation due to different contextual styles (read vs. spontaneous speech) shall be analysed. Finally, the corpus enables us to investigate whether linguistic change has occurred in spoken (near-)standard German.
pdf
abs
memasysco: XML schema based metadata management system for speech corpora
Joachim Gasch
|
Caren Brinckmann
|
Sylvia Dickgießer
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
The metadata management system for speech corpora memasysco has been developed at the Institut für Deutsche Sprache (IDS) and is applied for the first time to document the speech corpus German Today. memasysco is based on a data model for the documentation of speech corpora and contains two generic XML schemas that drive data capture, XML native database storage, dynamic publishing, and information retrieval. The development of memasyscos information architecture was mainly based on the ISLE MetaData Initiative (IMDI) guidelines for publishing metadata of linguistic resources. However, since we also have to support the corpus management process in research projects at the IDS, we need a finer atomic granularity for some documentation components as well as more restrictive categories to ensure data integrity. The XML metadata of different speech corpus projects are centrally validated and natively stored in an Oracle XML database. The extension of the system to the management of annotations of audio and video signals (e.g. orthographic and phonetic transcriptions) is planned for the near future.
2004
pdf
Multi-dimensional annotation of linguistic corpora for investigating information structure
Stefan Baumann
|
Caren Brinckmann
|
Silvia Hansen-Schirra
|
Geert-Jan Kruijff
|
Ivana Kruijff-Korbayová
|
Stella Neumann
|
Elke Teich
Proceedings of the Workshop Frontiers in Corpus Annotation at HLT-NAACL 2004
pdf
The MULI Project: Annotation and Analysis of Information Structure in German and English
Stefan Baumann
|
Caren Brinckmann
|
Silvia Hansen-Schirra
|
Geert-Jan Kruijff
|
Ivana Kruijff-Korbayová
|
Stella Neumann
|
Erich Steiner
|
Elke Teich
|
Hans Uszkoreit
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)