Functions of Code-Switching in Tweets: An Annotation Framework and Some Initial Experiments
Rafiya Begum, Kalika Bali, Monojit Choudhury, Koustav Rudra, Niloy Ganguly
Abstract
Code-Switching (CS) between two languages is extremely common in communities with societal multilingualism where speakers switch between two or more languages when interacting with each other. CS has been extensively studied in spoken language by linguists for several decades but with the popularity of social-media and less formal Computer Mediated Communication, we now see a big rise in the use of CS in the text form. This poses interesting challenges and a need for computational processing of such code-switched data. As with any Computational Linguistic analysis and Natural Language Processing tools and applications, we need annotated data for understanding, processing, and generation of code-switched language. In this study, we focus on CS between English and Hindi Tweets extracted from the Twitter stream of Hindi-English bilinguals. We present an annotation scheme for annotating the pragmatic functions of CS in Hindi-English (Hi-En) code-switched tweets based on a linguistic analysis and some initial experiments.- Anthology ID:
- L16-1260
- Volume:
- Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
- Month:
- May
- Year:
- 2016
- Address:
- Portorož, Slovenia
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 1644–1650
- Language:
- URL:
- https://aclanthology.org/L16-1260
- DOI:
- Cite (ACL):
- Rafiya Begum, Kalika Bali, Monojit Choudhury, Koustav Rudra, and Niloy Ganguly. 2016. Functions of Code-Switching in Tweets: An Annotation Framework and Some Initial Experiments. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1644–1650, Portorož, Slovenia. European Language Resources Association (ELRA).
- Cite (Informal):
- Functions of Code-Switching in Tweets: An Annotation Framework and Some Initial Experiments (Begum et al., LREC 2016)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/L16-1260.pdf