Vulnerability in Acquisition, Language Impairments in Dutch: Creating a VALID Data Archive

Jetske Klatter, Roeland van Hout, Henk van den Heuvel, Paula Fikkert, Anne Baker, Jan de Jong, Frank Wijnen, Eric Sanders, Paul Trilsbeek


Abstract
The VALID Data Archive is an open multimedia data archive (under construction) with data from speakers suffering from language impairments. We report on a pilot project in the CLARIN-NL framework in which five data resources were curated. For all data sets concerned, written informed consent from the participants or their caretakers has been obtained. All materials were anonymized. The audio files were converted into wav (linear PCM) files and the transcriptions into CHAT or ELAN format. Research data that consisted of test, SPSS and Excel files were documented and converted into CSV files. All data sets obtained appropriate CMDI metadata files. A new CMDI metadata profile for this type of data resources was established and care was taken that ISOcat metadata categories were used to optimize interoperability. After curation all data are deposited at the Max Planck Institute for Psycholinguistics Nijmegen where persistent identifiers are linked to all resources. The content of the transcriptions in CHAT and plain text format can be searched with the TROVA search engine.
Anthology ID:
L14-1682
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
357–364
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/89_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Jetske Klatter, Roeland van Hout, Henk van den Heuvel, Paula Fikkert, Anne Baker, Jan de Jong, Frank Wijnen, Eric Sanders, and Paul Trilsbeek. 2014. Vulnerability in Acquisition, Language Impairments in Dutch: Creating a VALID Data Archive. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 357–364, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Vulnerability in Acquisition, Language Impairments in Dutch: Creating a VALID Data Archive (Klatter et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/89_Paper.pdf