A Code-Switching Corpus of Turkish-German Conversations

Özlem Çetinoğlu


Abstract
We present a code-switching corpus of Turkish-German that is collected by recording conversations of bilinguals. The recordings are then transcribed in two layers following speech and orthography conventions, and annotated with sentence boundaries and intersentential, intrasentential, and intra-word switch points. The total amount of data is 5 hours of speech which corresponds to 3614 sentences. The corpus aims at serving as a resource for speech or text analysis, as well as a collection for linguistic inquiries.
Anthology ID:
W17-0804
Volume:
Proceedings of the 11th Linguistic Annotation Workshop
Month:
April
Year:
2017
Address:
Valencia, Spain
Venue:
LAW
SIG:
SIGANN
Publisher:
Association for Computational Linguistics
Note:
Pages:
34–40
Language:
URL:
https://aclanthology.org/W17-0804
DOI:
10.18653/v1/W17-0804
Bibkey:
Cite (ACL):
Özlem Çetinoğlu. 2017. A Code-Switching Corpus of Turkish-German Conversations. In Proceedings of the 11th Linguistic Annotation Workshop, pages 34–40, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
A Code-Switching Corpus of Turkish-German Conversations (Çetinoğlu, LAW 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/W17-0804.pdf