NLP_UIOWA at Semeval-2021 Task 5: Transferring Toxic Sets to Tag Toxic Spans

Jonathan Rusert


Abstract
We leverage a BLSTM with attention to identify toxic spans in texts. We explore different dimensions which affect the model’s performance. The first dimension explored is the toxic set the model is trained on. Besides the provided dataset, we explore the transferability of 5 different toxic related sets, including offensive, toxic, abusive, and hate sets. We find that the solely offensive set shows the highest promise of transferability. The second dimension we explore is methodology, including leveraging attention, employing a greedy remove method, using a frequency ratio, and examining hybrid combinations of multiple methods. We conduct an error analysis to examine which types of toxic spans were missed and which were wrongly inferred as toxic along with the main reasons why they occurred. Finally, we extend our method via ensembles, which achieves our highest F1 score of 55.1.
Anthology ID:
2021.semeval-1.119
Volume:
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Month:
August
Year:
2021
Address:
Online
Editors:
Alexis Palmer, Nathan Schneider, Natalie Schluter, Guy Emerson, Aurelie Herbelot, Xiaodan Zhu
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
881–887
Language:
URL:
https://aclanthology.org/2021.semeval-1.119
DOI:
10.18653/v1/2021.semeval-1.119
Bibkey:
Cite (ACL):
Jonathan Rusert. 2021. NLP_UIOWA at Semeval-2021 Task 5: Transferring Toxic Sets to Tag Toxic Spans. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 881–887, Online. Association for Computational Linguistics.
Cite (Informal):
NLP_UIOWA at Semeval-2021 Task 5: Transferring Toxic Sets to Tag Toxic Spans (Rusert, SemEval 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/2021.semeval-1.119.pdf
Data
OLID