Benchmark Dataset for Propaganda Detection in Czech Newspaper Texts

Vít Baisa, Ondřej Herman, Ales Horak


Abstract
Propaganda of various pressure groups ranging from big economies to ideological blocks is often presented in a form of objective newspaper texts. However, the real objectivity is here shaded with the support of imbalanced views and distorted attitudes by means of various manipulative stylistic techniques. In the project of Manipulative Propaganda Techniques in the Age of Internet, a new resource for automatic analysis of stylistic mechanisms for influencing the readers’ opinion is developed. In its current version, the resource consists of 7,494 newspaper articles from four selected Czech digital news servers annotated for the presence of specific manipulative techniques. In this paper, we present the current state of the annotations and describe the structure of the dataset in detail. We also offer an evaluation of bag-of-words classification algorithms for the annotated manipulative techniques.
Anthology ID:
R19-1010
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
77–83
Language:
URL:
https://aclanthology.org/R19-1010
DOI:
10.26615/978-954-452-056-4_010
Bibkey:
Cite (ACL):
Vít Baisa, Ondřej Herman, and Ales Horak. 2019. Benchmark Dataset for Propaganda Detection in Czech Newspaper Texts. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 77–83, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Benchmark Dataset for Propaganda Detection in Czech Newspaper Texts (Baisa et al., RANLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/R19-1010.pdf