SkoltechNLP at SemEval-2020 Task 11: Exploring Unsupervised Text Augmentation for Propaganda Detection

Daryna Dementieva; Igor Markov; Alexander Panchenko

doi:10.18653/v1/2020.semeval-1.234

SkoltechNLP at SemEval-2020 Task 11: Exploring Unsupervised Text Augmentation for Propaganda Detection

Daryna Dementieva, Igor Markov, Alexander Panchenko

Abstract

This paper presents a solution for the Span Identification (SI) task in the “Detection of Propaganda Techniques in News Articles” competition at SemEval-2020. The goal of the SI task is to identify specific fragments of each article which contain the use of at least one propaganda technique. This is a binary sequence tagging task. We tested several approaches finally selecting a fine-tuned BERT model as our baseline model. Our main contribution is an investigation of several unsupervised data augmentation techniques based on distributional semantics expanding the original small training dataset as applied to this BERT-based sequence tagger. We explore various expansion strategies and show that they can substantially shift the balance between precision and recall, while maintaining comparable levels of the F1 score.

Anthology ID:: 2020.semeval-1.234
Volume:: Proceedings of the Fourteenth Workshop on Semantic Evaluation
Month:: December
Year:: 2020
Address:: Barcelona (online)
Editors:: Aurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
Venue:: SemEval
SIG:: SIGLEX
Publisher:: International Committee for Computational Linguistics
Note:
Pages:: 1786–1792
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.semeval-1.234/
DOI:: 10.18653/v1/2020.semeval-1.234
Bibkey:
Cite (ACL):: Daryna Dementieva, Igor Markov, and Alexander Panchenko. 2020. SkoltechNLP at SemEval-2020 Task 11: Exploring Unsupervised Text Augmentation for Propaganda Detection. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1786–1792, Barcelona (online). International Committee for Computational Linguistics.
Cite (Informal):: SkoltechNLP at SemEval-2020 Task 11: Exploring Unsupervised Text Augmentation for Propaganda Detection (Dementieva et al., SemEval 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.semeval-1.234.pdf

PDF Cite Search Fix data