Abstract
We present a code-switching corpus of Turkish-German that is collected by recording conversations of bilinguals. The recordings are then transcribed in two layers following speech and orthography conventions, and annotated with sentence boundaries and intersentential, intrasentential, and intra-word switch points. The total amount of data is 5 hours of speech which corresponds to 3614 sentences. The corpus aims at serving as a resource for speech or text analysis, as well as a collection for linguistic inquiries.- Anthology ID:
- W17-0804
- Volume:
- Proceedings of the 11th Linguistic Annotation Workshop
- Month:
- April
- Year:
- 2017
- Address:
- Valencia, Spain
- Editors:
- Nathan Schneider, Nianwen Xue
- Venue:
- LAW
- SIG:
- SIGANN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 34–40
- Language:
- URL:
- https://aclanthology.org/W17-0804
- DOI:
- 10.18653/v1/W17-0804
- Cite (ACL):
- Özlem Çetinoğlu. 2017. A Code-Switching Corpus of Turkish-German Conversations. In Proceedings of the 11th Linguistic Annotation Workshop, pages 34–40, Valencia, Spain. Association for Computational Linguistics.
- Cite (Informal):
- A Code-Switching Corpus of Turkish-German Conversations (Çetinoğlu, LAW 2017)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/W17-0804.pdf