Abstract
This paper describes our system for Task 4 of SemEval 2022: Patronizing and Condescending Language Detection. Patronizing and Condescending Language (PCL) refers to language used with respect to vulnerable communities that portrays them in a pitiful way and is reflective of a sense of superiority. Task 4 involved binary classification (Subtask 1) and multi-label classification (Subtask 2) of Patronizing and Condescending Language (PCL). For our system, we experimented with fine-tuning different transformer-based pre-trained models including BERT, DistilBERT, RoBERTa and ALBERT. Further, we have used token separated metadata in order to improve our model by helping it contextualize different communities with respect to PCL. We faced the challenge of class imbalance, which we solved by experimenting with different class weighting schemes. Our models were effective in both subtasks, with the best performance coming out of models with Effective Number of Samples (ENS) class weighting and token separated metadata in both subtasks. For subtask 1 and subtask 2, our best models were finetuned BERT and RoBERTa models respectively.- Anthology ID:
- 2022.semeval-1.63
- Volume:
- Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, United States
- Editors:
- Guy Emerson, Natalie Schluter, Gabriel Stanovsky, Ritesh Kumar, Alexis Palmer, Nathan Schneider, Siddharth Singh, Shyam Ratan
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 464–472
- Language:
- URL:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2022.semeval-1.63/
- DOI:
- 10.18653/v1/2022.semeval-1.63
- Cite (ACL):
- Manan Suri. 2022. PiCkLe at SemEval-2022 Task 4: Boosting Pre-trained Language Models with Task Specific Metadata and Cost Sensitive Learning. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 464–472, Seattle, United States. Association for Computational Linguistics.
- Cite (Informal):
- PiCkLe at SemEval-2022 Task 4: Boosting Pre-trained Language Models with Task Specific Metadata and Cost Sensitive Learning (Suri, SemEval 2022)
- PDF:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2022.semeval-1.63.pdf