Abstract
Linguistic Code Switching (CS) is a phenomenon that occurs when multilingual speakers alternate between two or more languages/dialects within a single conversation. Processing CS data is especially challenging in intra-sentential data given state-of-the-art monolingual NLP technologies since such technologies are geared toward the processing of one language at a time. In this paper, we address the problem of Part-of-Speech tagging (POS) in the context of linguistic code switching (CS). We explore leveraging multiple neural network architectures to measure the impact of different pre-trained embeddings methods on POS tagging CS data. We investigate the landscape in four CS language pairs, Spanish-English, Hindi-English, Modern Standard Arabic- Egyptian Arabic dialect (MSA-EGY), and Modern Standard Arabic- Levantine Arabic dialect (MSA-LEV). Our results show that multilingual embedding (e.g., MSA-EGY and MSA-LEV) helps closely related languages (EGY/LEV) but adds noise to the languages that are distant (SPA/HIN). Finally, we show that our proposed models outperform state-of-the-art CS taggers for MSA-EGY language pair.- Anthology ID:
- W19-1410
- Volume:
- Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
- Month:
- June
- Year:
- 2019
- Address:
- Ann Arbor, Michigan
- Editors:
- Marcos Zampieri, Preslav Nakov, Shervin Malmasi, Nikola Ljubešić, Jörg Tiedemann, Ahmed Ali
- Venue:
- VarDial
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 99–109
- Language:
- URL:
- https://aclanthology.org/W19-1410
- DOI:
- 10.18653/v1/W19-1410
- Cite (ACL):
- Fahad AlGhamdi and Mona Diab. 2019. Leveraging Pretrained Word Embeddings for Part-of-Speech Tagging of Code Switching Data. In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 99–109, Ann Arbor, Michigan. Association for Computational Linguistics.
- Cite (Informal):
- Leveraging Pretrained Word Embeddings for Part-of-Speech Tagging of Code Switching Data (AlGhamdi & Diab, VarDial 2019)
- PDF:
- https://preview.aclanthology.org/ml4al-ingestion/W19-1410.pdf