Lexicools at SemEval-2023 Task 10: Sexism Lexicon Construction via XAI

Pakawat Nakwijit; Mahmoud Samir; Matthew Purver

doi:10.18653/v1/2023.semeval-1.4

Lexicools at SemEval-2023 Task 10: Sexism Lexicon Construction via XAI

Pakawat Nakwijit, Mahmoud Samir, Matthew Purver

Abstract

This paper presents our work on the SemEval-2023 Task 10 Explainable Detection of Online Sexism (EDOS) using lexicon-based models. Our approach consists of three main steps: lexicon construction based on Pointwise Mutual Information (PMI) and Shapley value, lexicon augmentation using an unannotated corpus and Large Language Models (LLMs), and, lastly, lexical incorporation for Bag-of-Word (BoW) logistic regression and fine-tuning LLMs. Our results demonstrate that our Shapley approach effectively produces a high-quality lexicon. We also show that by simply counting the presence of certain words in our lexicons and comparing the count can outperform a BoW logistic regression in task B/C and fine-tuning BERT in task C. In the end, our classifier achieved F1-scores of 53.34\% and 27.31\% on the official blind test sets for tasks B and C, respectively. We, additionally, provide in-depth analysis highlighting model limitation and bias. We also present our attempts to understand the model’s behaviour based on our constructed lexicons. Our code and the resulting lexicons are open-sourced in our GitHub repository https://github.com/SirBadr/SemEval2022-Task10.

Anthology ID:: 2023.semeval-1.4
Volume:: Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Atul Kr. Ojha, A. Seza Doğruöz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 23–43
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.semeval-1.4/
DOI:: 10.18653/v1/2023.semeval-1.4
Bibkey:
Cite (ACL):: Pakawat Nakwijit, Mahmoud Samir, and Matthew Purver. 2023. Lexicools at SemEval-2023 Task 10: Sexism Lexicon Construction via XAI. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 23–43, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Lexicools at SemEval-2023 Task 10: Sexism Lexicon Construction via XAI (Nakwijit et al., SemEval 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.semeval-1.4.pdf

PDF Cite Search Fix data