Mohammad Salameh


2019

pdf bib
ADIDA: Automatic Dialect Identification for Arabic
Ossama Obeid | Mohammad Salameh | Houda Bouamor | Nizar Habash
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

This demo paper describes ADIDA, a web-based system for automatic dialect identification for Arabic text. The system distinguishes among the dialects of 25 Arab cities (from Rabat to Muscat) in addition to Modern Standard Arabic. The results are presented with either a point map or a heat map visualizing the automatic identification probabilities over a geographical map of the Arab World.

2018

pdf
Fine-Grained Arabic Dialect Identification
Mohammad Salameh | Houda Bouamor | Nizar Habash
Proceedings of the 27th International Conference on Computational Linguistics

Previous work on the problem of Arabic Dialect Identification typically targeted coarse-grained five dialect classes plus Standard Arabic (6-way classification). This paper presents the first results on a fine-grained dialect classification task covering 25 specific cities from across the Arab World, in addition to Standard Arabic – a very challenging task. We build several classification systems and explore a large space of features. Our results show that we can identify the exact city of a speaker at an accuracy of 67.9% for sentences with an average length of 7 words (a 9% relative error reduction over the state-of-the-art technique for Arabic dialect identification) and reach more than 90% when we consider 16 words. We also report on additional insights from a data analysis of similarity and difference across Arabic dialects.

pdf bib
SemEval-2018 Task 1: Affect in Tweets
Saif Mohammad | Felipe Bravo-Marquez | Mohammad Salameh | Svetlana Kiritchenko
Proceedings of the 12th International Workshop on Semantic Evaluation

We present the SemEval-2018 Task 1: Affect in Tweets, which includes an array of subtasks on inferring the affectual state of a person from their tweet. For each task, we created labeled data from English, Arabic, and Spanish tweets. The individual tasks are: 1. emotion intensity regression, 2. emotion intensity ordinal classification, 3. valence (sentiment) regression, 4. valence ordinal classification, and 5. emotion classification. Seventy-five teams (about 200 team members) participated in the shared task. We summarize the methods, resources, and tools used by the participating teams, with a focus on the techniques and resources that are particularly useful. We also analyze systems for consistent bias towards a particular race or gender. The data is made freely available to further improve our understanding of how people convey emotions through language.

pdf
The MADAR Arabic Dialect Corpus and Lexicon
Houda Bouamor | Nizar Habash | Mohammad Salameh | Wajdi Zaghouani | Owen Rambow | Dana Abdulrahim | Ossama Obeid | Salam Khalifa | Fadhl Eryani | Alexander Erdmann | Kemal Oflazer
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Unified Guidelines and Resources for Arabic Dialect Orthography
Nizar Habash | Fadhl Eryani | Salam Khalifa | Owen Rambow | Dana Abdulrahim | Alexander Erdmann | Reem Faraj | Wajdi Zaghouani | Houda Bouamor | Nasser Zalmout | Sara Hassan | Faisal Al-Shargi | Sakhar Alkhereyf | Basma Abdulkareem | Ramy Eskander | Mohammad Salameh | Hind Saddiki
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf
Integrating Morphological Desegmentation into Phrase-based Decoding
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Sentiment Lexicons for Arabic Social Media
Saif Mohammad | Mohammad Salameh | Svetlana Kiritchenko
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Existing Arabic sentiment lexicons have low coverage―with only a few thousand entries. In this paper, we present several large sentiment lexicons that were automatically generated using two different methods: (1) by using distant supervision techniques on Arabic tweets, and (2) by translating English sentiment lexicons into Arabic using a freely available statistical machine translation system. We compare the usefulness of new and old sentiment lexicons in the downstream application of sentence-level sentiment analysis. Our baseline sentiment analysis system uses numerous surface form features. Nonetheless, the system benefits from using additional features drawn from sentiment lexicons. The best result is obtained using the automatically generated Dialectal Hashtag Lexicon and the Arabic translations of the NRC Emotion Lexicon (accuracy of 66.6%). Finally, we describe a qualitative study of the automatic translations of English sentiment lexicons into Arabic, which shows that about 88% of the automatically translated entries are valid for English as well. Close to 10% of the invalid entries are caused by gross mistranslations, close to 40% by translations into a related word, and about 50% by differences in how the word is used in Arabic.

pdf
SemEval-2016 Task 7: Determining Sentiment Intensity of English and Arabic Phrases
Svetlana Kiritchenko | Saif Mohammad | Mohammad Salameh
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf
What Matters Most in Morphologically Segmented SMT Models?
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf
Multiple System Combination for Transliteration
Garrett Nicolai | Bradley Hauer | Mohammad Salameh | Adam St Arnaud | Ying Xu | Lei Yao | Grzegorz Kondrak
Proceedings of the Fifth Named Entity Workshop

pdf
Sentiment after Translation: A Case-Study on Arabic Social Media Posts
Mohammad Salameh | Saif Mohammad | Svetlana Kiritchenko
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf
Lattice Desegmentation for Statistical Machine Translation
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf
Reversing Morphological Tokenization in English-to-Arabic SMT
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the 2013 NAACL HLT Student Research Workshop

pdf
Cognate and Misspelling Features for Natural Language Identification
Garrett Nicolai | Bradley Hauer | Mohammad Salameh | Lei Yao | Grzegorz Kondrak
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

2012

pdf
Transliteration Experiments on Chinese and Arabic
Grzegorz Kondrak | Xingkai Li | Mohammad Salameh
Proceedings of the 4th Named Entity Workshop (NEWS) 2012