A Database of Age and Gender Annotated Telephone Speech

Felix Burkhardt, Martin Eckert, Wiebke Johannsen, Joachim Stegmann


Abstract
This article describes an age-annotated database of German telephone speech. All in all 47 hours of prompted and free text was recorded, uttered by 954 paid participants in a style typical for automated voice services. The participants were selected based on an equal distribution of males and females within four age cluster groups; children, youth, adults and seniors. Within the children, gender is not distinguished, because it doesn’t have a strong enough effect on the voice. The textual content was designed to be typical for automated voice services and consists mainly of short commands, single words and numbers. An additional database consists of 659 speakers (368 female and 291 male) that called an automated voice portal server and answered freely on one of the two questions “What is your favourite dish?” and “What would you take to an island?” (island set, 422 speakers). This data might be used for out-of domain testing. The data will be used to tune an age-detecting automated voice service and might be released to research institutes under controlled conditions as part of an open age and gender detection challenge.
Anthology ID:
L10-1178
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/262_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Felix Burkhardt, Martin Eckert, Wiebke Johannsen, and Joachim Stegmann. 2010. A Database of Age and Gender Annotated Telephone Speech. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
A Database of Age and Gender Annotated Telephone Speech (Burkhardt et al., LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/262_Paper.pdf