Fahad AlGhamdi


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2019

pdf bib
Leveraging Pretrained Word Embeddings for Part-of-Speech Tagging of Code Switching Data
Fahad AlGhamdi | Mona Diab
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects

Linguistic Code Switching (CS) is a phenomenon that occurs when multilingual speakers alternate between two or more languages/dialects within a single conversation. Processing CS data is especially challenging in intra-sentential data given state-of-the-art monolingual NLP technologies since such technologies are geared toward the processing of one language at a time. In this paper, we address the problem of Part-of-Speech tagging (POS) in the context of linguistic code switching (CS). We explore leveraging multiple neural network architectures to measure the impact of different pre-trained embeddings methods on POS tagging CS data. We investigate the landscape in four CS language pairs, Spanish-English, Hindi-English, Modern Standard Arabic- Egyptian Arabic dialect (MSA-EGY), and Modern Standard Arabic- Levantine Arabic dialect (MSA-LEV). Our results show that multilingual embedding (e.g., MSA-EGY and MSA-LEV) helps closely related languages (EGY/LEV) but adds noise to the languages that are distant (SPA/HIN). Finally, we show that our proposed models outperform state-of-the-art CS taggers for MSA-EGY language pair.

2018

pdf bib
WASA: A Web Application for Sequence Annotation
Fahad AlGhamdi | Mona Diab
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching
Gustavo Aguilar | Fahad AlGhamdi | Victor Soto | Thamar Solorio | Mona Diab | Julia Hirschberg
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching

pdf bib
Named Entity Recognition on Code-Switched Data: Overview of the CALCS 2018 Shared Task
Gustavo Aguilar | Fahad AlGhamdi | Victor Soto | Mona Diab | Julia Hirschberg | Thamar Solorio
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching

In the third shared task of the Computational Approaches to Linguistic Code-Switching (CALCS) workshop, we focus on Named Entity Recognition (NER) on code-switched social-media data. We divide the shared task into two competitions based on the English-Spanish (ENG-SPA) and Modern Standard Arabic-Egyptian (MSA-EGY) language pairs. We use Twitter data and 9 entity types to establish a new dataset for code-switched NER benchmarks. In addition to the CS phenomenon, the diversity of the entities and the social media challenges make the task considerably hard to process. As a result, the best scores of the competitions are 63.76% and 71.61% for ENG-SPA and MSA-EGY, respectively. We present the scores of 9 participants and discuss the most common challenges among submissions.

2016

pdf bib
Creating a Large Multi-Layered Representational Repository of Linguistic Code Switched Arabic Data
Mona Diab | Mahmoud Ghoneim | Abdelati Hawwari | Fahad AlGhamdi | Nada AlMarwani | Mohamed Al-Badrashiny
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present our effort to create a large Multi-Layered representational repository of Linguistic Code-Switched Arabic data. The process involves developing clear annotation standards and Guidelines, streamlining the annotation process, and implementing quality control measures. We used two main protocols for annotation: in-lab gold annotations and crowd sourcing annotations. We developed a web-based annotation tool to facilitate the management of the annotation process. The current version of the repository contains a total of 886,252 tokens that are tagged into one of sixteen code-switching tags. The data exhibits code switching between Modern Standard Arabic and Egyptian Dialectal Arabic representing three data genres: Tweets, commentaries, and discussion fora. The overall Inter-Annotator Agreement is 93.1%.

pdf bib
Overview for the Second Shared Task on Language Identification in Code-Switched Data
Giovanni Molina | Fahad AlGhamdi | Mahmoud Ghoneim | Abdelati Hawwari | Nicolas Rey-Villamizar | Mona Diab | Thamar Solorio
Proceedings of the Second Workshop on Computational Approaches to Code Switching

pdf bib
Part of Speech Tagging for Code Switched Data
Fahad AlGhamdi | Giovanni Molina | Mona Diab | Thamar Solorio | Abdelati Hawwari | Victor Soto | Julia Hirschberg
Proceedings of the Second Workshop on Computational Approaches to Code Switching

2014

pdf bib
Overview for the First Shared Task on Language Identification in Code-Switched Data
Thamar Solorio | Elizabeth Blair | Suraj Maharjan | Steven Bethard | Mona Diab | Mahmoud Ghoneim | Abdelati Hawwari | Fahad AlGhamdi | Julia Hirschberg | Alison Chang | Pascale Fung
Proceedings of the First Workshop on Computational Approaches to Code Switching