Abstract
This paper describes our multiclass classification system developed as part of the LT-EDI@RANLP-2023 shared task. We used a BERT-based language model to detect homophobic and transphobic content in social media comments across five language conditions: English, Spanish, Hindi, Malayalam, and Tamil. We retrained a transformer-based cross-language pretrained language model, XLM-RoBERTa, with spatially and temporally relevant social media language data. We found the inclusion of this spatio-temporal data improved the classification performance for all language and task conditions when compared with the baseline. We also retrained a subset of models with simulated script-mixed social media language data with varied performance. The results from the current study suggests that transformer-based language classification systems are sensitive to register-specific and language-specific retraining.- Anthology ID:
- 2023.ltedi-1.15
- Volume:
- Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion
- Month:
- September
- Year:
- 2023
- Address:
- Varna, Bulgaria
- Editors:
- Bharathi R. Chakravarthi, B. Bharathi, Joephine Griffith, Kalika Bali, Paul Buitelaar
- Venues:
- LTEDI | WS
- SIG:
- Publisher:
- INCOMA Ltd., Shoumen, Bulgaria
- Note:
- Pages:
- 103–108
- Language:
- URL:
- https://aclanthology.org/2023.ltedi-1.15
- DOI:
- Cite (ACL):
- Sidney Wong, Matthew Durward, Benjamin Adams, and Jonathan Dunn. 2023. cantnlp@LT-EDI-2023: Homophobia/Transphobia Detection in Social Media Comments using Spatio-Temporally Retrained Language Models. In Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion, pages 103–108, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
- Cite (Informal):
- cantnlp@LT-EDI-2023: Homophobia/Transphobia Detection in Social Media Comments using Spatio-Temporally Retrained Language Models (Wong et al., LTEDI-WS 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2023.ltedi-1.15.pdf