Abstract
In this paper, we extend the work of benchmarking GPT by turning GPT models into classifiers and applying them on three different Twitter datasets on Hate-Speech Detection, Offensive Language Detection, and Emotion Classification. We use a Zero-Shot and Few-Shot approach to evaluate the classification capabilities of the GPT models. Our results show that GPT models do not always beat fine-tuned models on the tested benchmarks. However, in Hate-Speech and Emotion Detection, using a Few-Shot approach, state-of-the-art performance can be achieved. The results also reveal that GPT-4 is more sensitive to the examples given in a Few-Shot prompt, highlighting the importance of choosing fitting examples for inference and prompt formulation.- Anthology ID:
- 2024.trac-1.14
- Volume:
- Proceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying @ LREC-COLING-2024
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Bharathi Raja Chakravarthi, Bornini Lahiri, Siddharth Singh, Shyam Ratan
- Venues:
- TRAC | WS
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 126–133
- Language:
- URL:
- https://aclanthology.org/2024.trac-1.14
- DOI:
- Cite (ACL):
- Nikolaj Bauer, Moritz Preisig, and Martin Volk. 2024. Offensiveness, Hate, Emotion and GPT: Benchmarking GPT3.5 and GPT4 as Classifiers on Twitter-specific Datasets. In Proceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying @ LREC-COLING-2024, pages 126–133, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Offensiveness, Hate, Emotion and GPT: Benchmarking GPT3.5 and GPT4 as Classifiers on Twitter-specific Datasets (Bauer et al., TRAC-WS 2024)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/2024.trac-1.14.pdf