Abstract
This paper describes our homophobia/transphobia in social media comments detection system developed as part of the shared task at LT-EDI-2024. We took a transformer-based approach to develop our multiclass classification model for ten language conditions (English, Spanish, Gujarati, Hindi, Kannada, Malayalam, Marathi, Tamil, Tulu, and Telugu). We introduced synthetic and organic instances of script-switched language data during domain adaptation to mirror the linguistic realities of social media language as seen in the labelled training data. Our system ranked second for Gujarati and Telugu with varying levels of performance for other language conditions. The results suggest incorporating elements of paralinguistic behaviour such as script-switching may improve the performance of language detection systems especially in the cases of under-resourced languages conditions.- Anthology ID:
- 2024.ltedi-1.19
- Volume:
- Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion
- Month:
- March
- Year:
- 2024
- Address:
- St. Julian's, Malta
- Editors:
- Bharathi Raja Chakravarthi, Bharathi B, Paul Buitelaar, Thenmozhi Durairaj, György Kovács, Miguel Ángel García Cumbreras
- Venues:
- LTEDI | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 177–183
- Language:
- URL:
- https://aclanthology.org/2024.ltedi-1.19
- DOI:
- Cite (ACL):
- Sidney Wong and Matthew Durward. 2024. cantnlp@LT-EDI-2024: Automatic Detection of Anti-LGBTQ+ Hate Speech in Under-resourced Languages. In Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion, pages 177–183, St. Julian's, Malta. Association for Computational Linguistics.
- Cite (Informal):
- cantnlp@LT-EDI-2024: Automatic Detection of Anti-LGBTQ+ Hate Speech in Under-resourced Languages (Wong & Durward, LTEDI-WS 2024)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/2024.ltedi-1.19.pdf