Teagan Johnson
2026
The Counterfactuals at SemEval-2026 Task 9: Can Counterfactually-Inspired Preprocessing help Detect Polarization?
Teagan Johnson
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Teagan Johnson
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper presents the English-language submissions of The Counterfactuals team for the three subtasks of Task 9 at SemEval 2026. The task aims to detect multicultural online polarization, how it is expressed, and in what contexts. The task provides a high-quality annotation dataset of posts that follows a three-level schema: polarized or not (subtask 1), polarization type classification (subtask 2), and manifestation identification (subtask 3). I construct a pointwise mutual information-based lexicon that identifies highly-correlated words with the polarized class as labeled in subtask 1. Using this lexicon, I implement a large language model data augmentation technique. I then use the preprocessed datasets to finetune a BERT model (BERTweet) for each subtask. My highest performing models placed 48th out of 60, 35th out of 36, and 17th out of 24 on subtasks 1, 2, and 3 respectively. All code is available on GitHub.