Andrei Paraschiv


2021

pdf bib
UPB at SemEval-2021 Task 5: Virtual Adversarial Training for Toxic Spans Detection
Andrei Paraschiv | Dumitru-Clementin Cercel | Mihai Dascalu
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

The real-world impact of polarization and toxicity in the online sphere marked the end of 2020 and the beginning of this year in a negative way. Semeval-2021, Task 5 - Toxic Spans Detection is based on a novel annotation of a subset of the Jigsaw Unintended Bias dataset and is the first language toxicity detection task dedicated to identifying the toxicity-level spans. For this task, participants had to automatically detect character spans in short comments that render the message as toxic. Our model considers applying Virtual Adversarial Training in a semi-supervised setting during the fine-tuning process of several Transformer-based models (i.e., BERT and RoBERTa), in combination with Conditional Random Fields. Our approach leads to performance improvements and more robust models, enabling us to achieve an F1-score of 65.73% in the official submission and an F1-score of 66.13% after further tuning during post-evaluation.

2020

pdf bib
UPB at SemEval-2020 Task 11: Propaganda Detection with Domain-Specific Trained BERT
Andrei Paraschiv | Dumitru-Clementin Cercel | Mihai Dascalu
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Manipulative and misleading news have become a commodity for some online news outlets and these news have gained a significant impact on the global mindset of people. Propaganda is a frequently employed manipulation method having as goal to influence readers by spreading ideas meant to distort or manipulate their opinions. This paper describes our participation in the SemEval-2020, Task 11: Detection of PropagandaTechniques in News Articles competition. Our approach considers specializing a pre-trained BERT model on propagandistic and hyperpartisan news articles, enabling it to create more adequate representations for the two subtasks, namely propaganda Span Identification (SI) and propaganda Technique Classification (TC). Our proposed system achieved a F1-score of 46.060% in subtask SI, ranking 5th in the leaderboard from 36 teams and a micro-averaged F1 score of 54.302% for subtask TC, ranking 19th from 32 teams.