Ayman Al Zaatari
Also published as: Ayman Al Zaatari
2019
Assessing Arabic Weblog Credibility via Deep Co-learning
Chadi Helwe
|
Shady Elbassuoni
|
Ayman Al Zaatari
|
Wassim El-Hajj
Proceedings of the Fourth Arabic Natural Language Processing Workshop
Assessing the credibility of online content has garnered a lot of attention lately. We focus on one such type of online content, namely weblogs or blogs for short. Some recent work attempted the task of automatically assessing the credibility of blogs, typically via machine learning. However, in the case of Arabic blogs, there are hardly any datasets available that can be used to train robust machine learning models for this difficult task. To overcome the lack of sufficient training data, we propose deep co-learning, a semi-supervised end-to-end deep learning approach to assess the credibility of Arabic blogs. In deep co-learning, multiple weak deep neural network classifiers are trained using a small labeled dataset, and each using a different view of the data. Each one of these classifiers is then used to classify unlabeled data, and its prediction is used to train the other classifiers in a semi-supervised fashion. We evaluate our deep co-learning approach on an Arabic blogs dataset, and we report significant improvements in performance compared to many baselines including fully-supervised deep learning models as well as ensemble models.
2016
Arabic Corpora for Credibility Analysis
Ayman Al Zaatari
|
Rim El Ballouli
|
Shady ELbassouni
|
Wassim El-Hajj
|
Hazem Hajj
|
Khaled Shaban
|
Nizar Habash
|
Emad Yahya
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
A significant portion of data generated on blogging and microblogging websites is non-credible as shown in many recent studies. To filter out such non-credible information, machine learning can be deployed to build automatic credibility classifiers. However, as in the case with most supervised machine learning approaches, a sufficiently large and accurate training data must be available. In this paper, we focus on building a public Arabic corpus of blogs and microblogs that can be used for credibility classification. We focus on Arabic due to the recent popularity of blogs and microblogs in the Arab World and due to the lack of any such public corpora in Arabic. We discuss our data acquisition approach and annotation process, provide rigid analysis on the annotated data and finally report some results on the effectiveness of our data for credibility classification.
Search
Co-authors
- Chadi Helwe 1
- Emad Yahya 1
- Hazem Hajj 1
- Khaled Shaban 1
- Nizar Habash 1
- show all...