A pilot study for a Corpus of Dutch Aphasic Speech (CoDAS)

Eline Westerhout, Paola Monachesi


Abstract
In this paper, a pilot study for the development of a corpus of Dutch Aphasic Speech (CoDAS) is presented. Given the lack of resources of this kind not only for Dutch but also for other languages, CoDAS will be able to set standards and will contribute to the future research in this area. Given the special character of the speech contained in CoDAS, we cannot simply carry over the design and annotation protocols of existing corpora, such as the Corpus Gesproken Nederlands or CHILDES. However, they have been assumed as starting point. We have investigated whether and how the procedures and protocols for the annotation (part-of-speech tagging) and transcription (orthographic and phonetic) used for the CGN should be adapted in order to annotate and transcribe aphasic speech properly. Besides, we have established the basic requirements with respect to text types, metadata, and annotation levels that CoDAS should fulfill.
Anthology ID:
L06-1417
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/672_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
Eline Westerhout and Paola Monachesi. 2006. A pilot study for a Corpus of Dutch Aphasic Speech (CoDAS). In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
A pilot study for a Corpus of Dutch Aphasic Speech (CoDAS) (Westerhout & Monachesi, LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/672_pdf.pdf