A large scale annotated child language construction database

Aline Villavicencio, Beracah Yankama, Marco Idiart, Robert Berwick


Abstract
Large scale annotated corpora of child language can be of great value in assessing theoretical proposals regarding language acquisition models. For example, they can help determine whether the type and amount of data required by a proposed language acquisition model can actually be found in a naturalistic data sample. To this end, several recent efforts have augmented the CHILDES child language corpora with POS tagging and parsing information for languages such as English. With the increasing availability of robust NLP systems and electronic resources, these corpora can be further annotated with more detailed information about the properties of words, verb argument structure, and sentences. This paper describes such an initiative for combining information from various sources to extend the annotation of the English CHILDES corpora with linguistic, psycholinguistic and distributional information, along with an example illustrating an application of this approach to the extraction of verb alternation information. The end result, the English CHILDES Verb Construction Database, is an integrated resource containing information such as grammatical relations, verb semantic classes, and age of acquisition, enabling more targeted complex searches involving different levels of annotation that can facilitate a more detailed analysis of the linguistic input available to children.
Anthology ID:
L12-1116
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2370–2374
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/276_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Aline Villavicencio, Beracah Yankama, Marco Idiart, and Robert Berwick. 2012. A large scale annotated child language construction database. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2370–2374, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
A large scale annotated child language construction database (Villavicencio et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/276_Paper.pdf