Mohammad Salameh
2019
ADIDA: Automatic Dialect Identification for Arabic
Ossama Obeid | Mohammad Salameh | Houda Bouamor | Nizar Habash
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)
Ossama Obeid | Mohammad Salameh | Houda Bouamor | Nizar Habash
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)
This demo paper describes ADIDA, a web-based system for automatic dialect identification for Arabic text. The system distinguishes among the dialects of 25 Arab cities (from Rabat to Muscat) in addition to Modern Standard Arabic. The results are presented with either a point map or a heat map visualizing the automatic identification probabilities over a geographical map of the Arab World.
2018
Fine-Grained Arabic Dialect Identification
Mohammad Salameh | Houda Bouamor | Nizar Habash
Proceedings of the 27th International Conference on Computational Linguistics
Mohammad Salameh | Houda Bouamor | Nizar Habash
Proceedings of the 27th International Conference on Computational Linguistics
Previous work on the problem of Arabic Dialect Identification typically targeted coarse-grained five dialect classes plus Standard Arabic (6-way classification). This paper presents the first results on a fine-grained dialect classification task covering 25 specific cities from across the Arab World, in addition to Standard Arabic – a very challenging task. We build several classification systems and explore a large space of features. Our results show that we can identify the exact city of a speaker at an accuracy of 67.9% for sentences with an average length of 7 words (a 9% relative error reduction over the state-of-the-art technique for Arabic dialect identification) and reach more than 90% when we consider 16 words. We also report on additional insights from a data analysis of similarity and difference across Arabic dialects.
The MADAR Arabic Dialect Corpus and Lexicon
Houda Bouamor | Nizar Habash | Mohammad Salameh | Wajdi Zaghouani | Owen Rambow | Dana Abdulrahim | Ossama Obeid | Salam Khalifa | Fadhl Eryani | Alexander Erdmann | Kemal Oflazer
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Houda Bouamor | Nizar Habash | Mohammad Salameh | Wajdi Zaghouani | Owen Rambow | Dana Abdulrahim | Ossama Obeid | Salam Khalifa | Fadhl Eryani | Alexander Erdmann | Kemal Oflazer
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Unified Guidelines and Resources for Arabic Dialect Orthography
Nizar Habash | Fadhl Eryani | Salam Khalifa | Owen Rambow | Dana Abdulrahim | Alexander Erdmann | Reem Faraj | Wajdi Zaghouani | Houda Bouamor | Nasser Zalmout | Sara Hassan | Faisal Al-Shargi | Sakhar Alkhereyf | Basma Abdulkareem | Ramy Eskander | Mohammad Salameh | Hind Saddiki
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Nizar Habash | Fadhl Eryani | Salam Khalifa | Owen Rambow | Dana Abdulrahim | Alexander Erdmann | Reem Faraj | Wajdi Zaghouani | Houda Bouamor | Nasser Zalmout | Sara Hassan | Faisal Al-Shargi | Sakhar Alkhereyf | Basma Abdulkareem | Ramy Eskander | Mohammad Salameh | Hind Saddiki
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
SemEval-2018 Task 1: Affect in Tweets
Saif Mohammad | Felipe Bravo-Marquez | Mohammad Salameh | Svetlana Kiritchenko
Proceedings of the 12th International Workshop on Semantic Evaluation
Saif Mohammad | Felipe Bravo-Marquez | Mohammad Salameh | Svetlana Kiritchenko
Proceedings of the 12th International Workshop on Semantic Evaluation
We present the SemEval-2018 Task 1: Affect in Tweets, which includes an array of subtasks on inferring the affectual state of a person from their tweet. For each task, we created labeled data from English, Arabic, and Spanish tweets. The individual tasks are: 1. emotion intensity regression, 2. emotion intensity ordinal classification, 3. valence (sentiment) regression, 4. valence ordinal classification, and 5. emotion classification. Seventy-five teams (about 200 team members) participated in the shared task. We summarize the methods, resources, and tools used by the participating teams, with a focus on the techniques and resources that are particularly useful. We also analyze systems for consistent bias towards a particular race or gender. The data is made freely available to further improve our understanding of how people convey emotions through language.
2016
Sentiment Lexicons for Arabic Social Media
Saif Mohammad | Mohammad Salameh | Svetlana Kiritchenko
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Saif Mohammad | Mohammad Salameh | Svetlana Kiritchenko
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Existing Arabic sentiment lexicons have low coverage―with only a few thousand entries. In this paper, we present several large sentiment lexicons that were automatically generated using two different methods: (1) by using distant supervision techniques on Arabic tweets, and (2) by translating English sentiment lexicons into Arabic using a freely available statistical machine translation system. We compare the usefulness of new and old sentiment lexicons in the downstream application of sentence-level sentiment analysis. Our baseline sentiment analysis system uses numerous surface form features. Nonetheless, the system benefits from using additional features drawn from sentiment lexicons. The best result is obtained using the automatically generated Dialectal Hashtag Lexicon and the Arabic translations of the NRC Emotion Lexicon (accuracy of 66.6%). Finally, we describe a qualitative study of the automatic translations of English sentiment lexicons into Arabic, which shows that about 88% of the automatically translated entries are valid for English as well. Close to 10% of the invalid entries are caused by gross mistranslations, close to 40% by translations into a related word, and about 50% by differences in how the word is used in Arabic.
Integrating Morphological Desegmentation into Phrase-based Decoding
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
SemEval-2016 Task 7: Determining Sentiment Intensity of English and Arabic Phrases
Svetlana Kiritchenko | Saif Mohammad | Mohammad Salameh
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)
Svetlana Kiritchenko | Saif Mohammad | Mohammad Salameh
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)
2015
Sentiment after Translation: A Case-Study on Arabic Social Media Posts
Mohammad Salameh | Saif Mohammad | Svetlana Kiritchenko
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Mohammad Salameh | Saif Mohammad | Svetlana Kiritchenko
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
What Matters Most in Morphologically Segmented SMT Models?
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation
Multiple System Combination for Transliteration
Garrett Nicolai | Bradley Hauer | Mohammad Salameh | Adam St Arnaud | Ying Xu | Lei Yao | Grzegorz Kondrak
Proceedings of the Fifth Named Entity Workshop
Garrett Nicolai | Bradley Hauer | Mohammad Salameh | Adam St Arnaud | Ying Xu | Lei Yao | Grzegorz Kondrak
Proceedings of the Fifth Named Entity Workshop
2014
Lattice Desegmentation for Statistical Machine Translation
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
2013
Reversing Morphological Tokenization in English-to-Arabic SMT
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the 2013 NAACL HLT Student Research Workshop
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the 2013 NAACL HLT Student Research Workshop
Cognate and Misspelling Features for Natural Language Identification
Garrett Nicolai | Bradley Hauer | Mohammad Salameh | Lei Yao | Grzegorz Kondrak
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications
Garrett Nicolai | Bradley Hauer | Mohammad Salameh | Lei Yao | Grzegorz Kondrak
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications
2012
Search
Fix author
Co-authors
- Grzegorz Kondrak 7
- Houda Bouamor 4
- Colin Cherry 4
- Nizar Habash 4
- Svetlana Kiritchenko 4
- Saif Mohammad 4
- Dana Abdulrahim 2
- Alexander Erdmann 2
- Fadhl Eryani 2
- Bradley Hauer 2
- Salam Khalifa 2
- Garrett Nicolai 2
- Ossama Obeid 2
- Owen Rambow 2
- Lei Yao 2
- Wajdi Zaghouani 2
- Basma Abdulkareem 1
- Faisal Al-Shargi 1
- Sakhar Alkhereyf 1
- Felipe Bravo-Marquez 1
- Ramy Eskander 1
- Reem Faraj 1
- Sara Hassan 1
- Xingkai Li 1
- Kemal Oflazer 1
- Hind Saddiki 1
- Adam St Arnaud 1
- Ying Xu 1
- Nasser Zalmout 1