A Semi-Supervised Approach to Detect Toxic Comments

Ghivvago Damas Saraiva, Rafael Anchiêta, Francisco Assis Ricarte Neto, Raimundo Moura


Abstract
Toxic comments contain forms of non-acceptable language targeted towards groups or individuals. These types of comments become a serious concern for government organizations, online communities, and social media platforms. Although there are some approaches to handle non-acceptable language, most of them focus on supervised learning and the English language. In this paper, we deal with toxic comment detection as a semi-supervised strategy over a heterogeneous graph. We evaluate the approach on a toxic dataset of the Portuguese language, outperforming several graph-based methods and achieving competitive results compared to transformer architectures.
Anthology ID:
2021.ranlp-1.142
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
1261–1267
Language:
URL:
https://aclanthology.org/2021.ranlp-1.142
DOI:
Bibkey:
Cite (ACL):
Ghivvago Damas Saraiva, Rafael Anchiêta, Francisco Assis Ricarte Neto, and Raimundo Moura. 2021. A Semi-Supervised Approach to Detect Toxic Comments. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1261–1267, Held Online. INCOMA Ltd..
Cite (Informal):
A Semi-Supervised Approach to Detect Toxic Comments (Saraiva et al., RANLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2021.ranlp-1.142.pdf
Code
 rafaelanchieta/toxic
Data
ToLD-Br