Simple Features for Strong Performance on Named Entity Recognition in Code-Switched Twitter Data
Devanshu Jain, Maria Kustikova, Mayank Darbari, Rishabh Gupta, Stephen Mayhew
Abstract
In this work, we address the problem of Named Entity Recognition (NER) in code-switched tweets as a part of the Workshop on Computational Approaches to Linguistic Code-switching (CALCS) at ACL’18. Code-switching is the phenomenon where a speaker switches between two languages or variants of the same language within or across utterances, known as intra-sentential or inter-sentential code-switching, respectively. Processing such data is challenging using state of the art methods since such technology is generally geared towards processing monolingual text. In this paper we explored ways to use language identification and translation to recognize named entities in such data, however, utilizing simple features (sans multi-lingual features) with Conditional Random Field (CRF) classifier achieved the best results. Our experiments were mainly aimed at the (ENG-SPA) English-Spanish dataset but we submitted a language-independent version of our system to the (MSA-EGY) Arabic-Egyptian dataset as well and achieved good results.- Anthology ID:
- W18-3213
- Volume:
- Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching
- Month:
- July
- Year:
- 2018
- Address:
- Melbourne, Australia
- Editors:
- Gustavo Aguilar, Fahad AlGhamdi, Victor Soto, Thamar Solorio, Mona Diab, Julia Hirschberg
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 103–109
- Language:
- URL:
- https://aclanthology.org/W18-3213
- DOI:
- 10.18653/v1/W18-3213
- Cite (ACL):
- Devanshu Jain, Maria Kustikova, Mayank Darbari, Rishabh Gupta, and Stephen Mayhew. 2018. Simple Features for Strong Performance on Named Entity Recognition in Code-Switched Twitter Data. In Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching, pages 103–109, Melbourne, Australia. Association for Computational Linguistics.
- Cite (Informal):
- Simple Features for Strong Performance on Named Entity Recognition in Code-Switched Twitter Data (Jain et al., ACL 2018)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/W18-3213.pdf