Abstract
The purpose of this paper is to ascertain the influence of sociocultural factors (i.e., social, cultural, and political) in the development of hate speech detection systems. We set out to investigate the suitability of using open-source training data to monitor levels of anti-LGBTQ+ content on social media across different national-varieties of English. Our findings suggests the social and cultural alignment of open-source hate speech data sets influences the predicted outputs. Furthermore, the keyword-search approach of anti-LGBTQ+ slurs in the development of open-source training data encourages detection models to overfit on slurs; therefore, anti-LGBTQ+ content may go undetected. We recommend combining empirical outputs with qualitative insights to ensure these systems are fit for purpose.- Anthology ID:
- 2024.c3nlp-1.7
- Volume:
- Proceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Vinodkumar Prabhakaran, Sunipa Dev, Luciana Benotti, Daniel Hershcovich, Laura Cabello, Yong Cao, Ife Adebara, Li Zhou
- Venues:
- C3NLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 84–97
- Language:
- URL:
- https://aclanthology.org/2024.c3nlp-1.7
- DOI:
- 10.18653/v1/2024.c3nlp-1.7
- Cite (ACL):
- Sidney Wong. 2024. Sociocultural Considerations in Monitoring Anti-LGBTQ+ Content on Social Media. In Proceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP, pages 84–97, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- Sociocultural Considerations in Monitoring Anti-LGBTQ+ Content on Social Media (Wong, C3NLP-WS 2024)
- PDF:
- https://preview.aclanthology.org/add_acl24_videos/2024.c3nlp-1.7.pdf