Abstract
The task of fake news detection is to distinguish legitimate news articles that describe real facts from those which convey deceiving and fictitious information. As the fake news phenomenon is omnipresent across all languages, it is crucial to be able to efficiently solve this problem for languages other than English. A common approach to this task is supervised classification using features of various complexity. Yet supervised machine learning requires substantial amount of annotated data. For English and a small number of other languages, annotated data availability is much higher, whereas for the vast majority of languages, it is almost scarce. We investigate whether machine translation at its present state could be successfully used as an automated technique for annotated corpora creation and augmentation for fake news detection focusing on the English-Urdu language pair. We train a fake news classifier for Urdu on (1) the manually annotated dataset originally in Urdu and (2) the machine-translated version of an existing annotated fake news dataset originally in English. We show that at the present state of machine translation quality for the English-Urdu language pair, the fully automated data augmentation through machine translation did not provide improvement for fake news detection in Urdu.- Anthology ID:
- 2020.lrec-1.309
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 2537–2542
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.309
- DOI:
- Cite (ACL):
- Maaz Amjad, Grigori Sidorov, and Alisa Zhila. 2020. Data Augmentation using Machine Translation for Fake News Detection in the Urdu Language. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2537–2542, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Data Augmentation using Machine Translation for Fake News Detection in the Urdu Language (Amjad et al., LREC 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2020.lrec-1.309.pdf