CLFD: A Novel Vectorization Technique and Its Application in Fake News Detection

Michail Mersinias, Stergos Afantenos, Georgios Chalkiadakis


Abstract
In recent years, fake news detection has been an emerging research area. In this paper, we put forward a novel statistical approach for the generation of feature vectors to describe a document. Our so-called class label frequency distance (clfd), is shown experimentally to provide an effective way for boosting the performance of machine learning methods. Specifically, our experiments, carried out in the fake news detection domain, verify that efficient traditional machine learning methods that use our vectorization approach, consistently outperform deep learning methods that use word embeddings for small and medium sized datasets, while the results are comparable for large datasets. In addition, we demonstrate that a novel hybrid method that utilizes both a clfd-boosted logistic regression classifier and a deep learning one, clearly outperforms deep learning methods even in large datasets.
Anthology ID:
2020.lrec-1.427
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
3475–3483
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.427
DOI:
Bibkey:
Cite (ACL):
Michail Mersinias, Stergos Afantenos, and Georgios Chalkiadakis. 2020. CLFD: A Novel Vectorization Technique and Its Application in Fake News Detection. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3475–3483, Marseille, France. European Language Resources Association.
Cite (Informal):
CLFD: A Novel Vectorization Technique and Its Application in Fake News Detection (Mersinias et al., LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.lrec-1.427.pdf