Abstract
Code-switching (usage of different languages within a single conversation context in an alternative manner) is a highly increasing phenomenon in social media and colloquial usage which poses different challenges for natural language processing. This paper introduces the first study for the detection of Turkish-English code-switching and also a small test data collected from social media in order to smooth the way for further studies. The proposed system using character level n-grams and conditional random fields (CRFs) obtains 95.6% micro-averaged F1-score on the introduced test data set.- Anthology ID:
 - W18-6115
 - Volume:
 - Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text
 - Month:
 - November
 - Year:
 - 2018
 - Address:
 - Brussels, Belgium
 - Editors:
 - Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
 - Venue:
 - WNUT
 - SIG:
 - Publisher:
 - Association for Computational Linguistics
 - Note:
 - Pages:
 - 110–115
 - Language:
 - URL:
 - https://aclanthology.org/W18-6115
 - DOI:
 - 10.18653/v1/W18-6115
 - Cite (ACL):
 - Zeynep Yirmibeşoğlu and Gülşen Eryiğit. 2018. Detecting Code-Switching between Turkish-English Language Pair. In Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, pages 110–115, Brussels, Belgium. Association for Computational Linguistics.
 - Cite (Informal):
 - Detecting Code-Switching between Turkish-English Language Pair (Yirmibeşoğlu & Eryiğit, WNUT 2018)
 - PDF:
 - https://preview.aclanthology.org/ingest-acl-2023-videos/W18-6115.pdf