SINAI at SemEval-2021 Task 5: Combining Embeddings in a BiLSTM-CRF model for Toxic Spans Detection
Flor Miriam Plaza-del-Arco, Pilar López-Úbeda, L. Alfonso Ureña-López, M. Teresa Martín-Valdivia
Abstract
This paper describes the participation of SINAI team at Task 5: Toxic Spans Detection which consists of identifying spans that make a text toxic. Although several resources and systems have been developed so far in the context of offensive language, both annotation and tasks have mainly focused on classifying whether a text is offensive or not. However, detecting toxic spans is crucial to identify why a text is toxic and can assist human moderators to locate this type of content on social media. In order to accomplish the task, we follow a deep learning-based approach using a Bidirectional variant of a Long Short Term Memory network along with a stacked Conditional Random Field decoding layer (BiLSTM-CRF). Specifically, we test the performance of the combination of different pre-trained word embeddings for recognizing toxic entities in text. The results show that the combination of word embeddings helps in detecting offensive content. Our team ranks 29th out of 91 participants.- Anthology ID:
- 2021.semeval-1.134
- Volume:
- Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Editors:
- Alexis Palmer, Nathan Schneider, Natalie Schluter, Guy Emerson, Aurelie Herbelot, Xiaodan Zhu
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 984–989
- Language:
- URL:
- https://aclanthology.org/2021.semeval-1.134
- DOI:
- 10.18653/v1/2021.semeval-1.134
- Cite (ACL):
- Flor Miriam Plaza-del-Arco, Pilar López-Úbeda, L. Alfonso Ureña-López, and M. Teresa Martín-Valdivia. 2021. SINAI at SemEval-2021 Task 5: Combining Embeddings in a BiLSTM-CRF model for Toxic Spans Detection. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 984–989, Online. Association for Computational Linguistics.
- Cite (Informal):
- SINAI at SemEval-2021 Task 5: Combining Embeddings in a BiLSTM-CRF model for Toxic Spans Detection (Plaza-del-Arco et al., SemEval 2021)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2021.semeval-1.134.pdf