SubmissionNumber#=%=#69 FinalPaperTitle#=%=#SuteAlbastre at SemEval-2024 Task 4: Predicting Propaganda Techniques in Multilingual Memes using Joint Text and Vision Transformers ShortPaperTitle#=%=# NumberOfPages#=%=#7 CopyrightSigned#=%=#Anghelina Ion-Marian JobTitle#==# Organization#==# Abstract#==#The main goal of this year's SemEval Task 4 is detecting the presence of persuasion techniques in various meme formats. While Subtask 1 targets text-only posts, Subtask 2, subsections a and b tackle posts containing both images and captions. The first 2 subtasks consist of multi-class and multi-label classifications, in the context of a hierarchical taxonomy of 22 different persuasion techniques. This paper proposes a solution for persuasion detection in both these scenarios and for vari- ous languages of the caption text. Our team's main approach consists of a Multimodal Learn- ing Neural Network architecture, having Tex- tual and Vision Transformers as its backbone. The models that we have experimented with in- clude EfficientNet and ViT as visual encoders and BERT and GPT2 as textual encoders. Author{1}{Firstname}#=%=#Ion Marian Author{1}{Lastname}#=%=#Anghelina Author{1}{Username}#=%=#ionanghelina Author{1}{Email}#=%=#ion.anghelina@s.unibuc.ro Author{1}{Affiliation}#=%=#University of Bucharest Author{2}{Firstname}#=%=#Gabriel Sebastian Author{2}{Lastname}#=%=#Buță Author{2}{Username}#=%=#butasebi Author{2}{Email}#=%=#butasebi@yahoo.com Author{2}{Affiliation}#=%=#University of Bucharest Author{3}{Firstname}#=%=#Alexandru Author{3}{Lastname}#=%=#Enache Author{3}{Username}#=%=#alexandruenache Author{3}{Email}#=%=#enache.g.alexandru@gmail.com Author{3}{Affiliation}#=%=#University of Bucharest ========== èéáğö