Lone Pine at SemEval-2021 Task 5: Fine-Grained Detection of Hate Speech Using BERToxic

Yakoob Khan; Weicheng Ma; Soroush Vosoughi

doi:10.18653/v1/2021.semeval-1.132

Lone Pine at SemEval-2021 Task 5: Fine-Grained Detection of Hate Speech Using BERToxic

Yakoob Khan, Weicheng Ma, Soroush Vosoughi

Abstract

This paper describes our approach to the Toxic Spans Detection problem (SemEval-2021 Task 5). We propose BERToxic, a system that fine-tunes a pre-trained BERT model to locate toxic text spans in a given text and utilizes additional post-processing steps to refine the boundaries. The post-processing steps involve (1) labeling character offsets between consecutive toxic tokens as toxic and (2) assigning a toxic label to words that have at least one token labeled as toxic. Through experiments, we show that these two post-processing steps improve the performance of our model by 4.16% on the test set. We also studied the effects of data augmentation and ensemble modeling strategies on our system. Our system significantly outperformed the provided baseline and achieved an F1-score of 0.683, placing Lone Pine in the 17th place out of 91 teams in the competition. Our code is made available at https://github.com/Yakoob-Khan/Toxic-Spans-Detection

Anthology ID:: 2021.semeval-1.132
Volume:: Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Month:: August
Year:: 2021
Address:: Online
Editors:: Alexis Palmer, Nathan Schneider, Natalie Schluter, Guy Emerson, Aurelie Herbelot, Xiaodan Zhu
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 967–973
Language:
URL:: https://preview.aclanthology.org/add_missing_videos/2021.semeval-1.132/
DOI:: 10.18653/v1/2021.semeval-1.132
Bibkey:
Cite (ACL):: Yakoob Khan, Weicheng Ma, and Soroush Vosoughi. 2021. Lone Pine at SemEval-2021 Task 5: Fine-Grained Detection of Hate Speech Using BERToxic. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 967–973, Online. Association for Computational Linguistics.
Cite (Informal):: Lone Pine at SemEval-2021 Task 5: Fine-Grained Detection of Hate Speech Using BERToxic (Khan et al., SemEval 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/add_missing_videos/2021.semeval-1.132.pdf
Code: Yakoob-Khan/Toxic-Spans-Detection
Data: HateXplain

PDF Search Code Fix data