A Phonemic Corpus of Polish Child-Directed Speech

Luc Boruta, Justyna Jastrzebska


Abstract
Recent advances in modeling early language acquisition are due not only to the development of machine-learning techniques, but also to the increasing availability of data on child language and child-adult interaction. In the absence of recordings of child-directed speech, or when models explicitly require such a representation for training data, phonemic transcriptions are commonly used as input data. We present a novel (and to our knowledge, the first) phonemic corpus of Polish child-directed speech. It is derived from the Weist corpus of Polish, freely available from the seminal CHILDES database. For the sake of reproducibility, and to exemplify the typical trade-off between ecological validity and sample size, we report all preprocessing operations and transcription guidelines. Contributed linguistic resources include updated CHAT-formatted transcripts with phonemic transcriptions in a novel phonology tier, as well as by-product data, such as a phonemic lexicon of Polish. All resources are distributed under the LGPL-LR license.
Anthology ID:
L12-1660
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1017–1020
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1120_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Luc Boruta and Justyna Jastrzebska. 2012. A Phonemic Corpus of Polish Child-Directed Speech. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1017–1020, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
A Phonemic Corpus of Polish Child-Directed Speech (Boruta & Jastrzebska, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1120_Paper.pdf