ClassBases at the CASE-2022 Multilingual Protest Event Detection Task: Multilingual Protest News Detection and Automatically Replicating Manually Created Event Datasets

Peratham Wiriyathammabhum


Abstract
In this report, we describe our ClassBases submissions to a shared task on multilingual protest event detection. For the multilingual protest news detection, we participated in subtask-1, subtask-2 and subtask-4 which are document classification, sentence classification and token classification. In subtask-1, we compare XLM-RoBERTa-base, mLUKE-base and XLM-RoBERTa-large on finetuning in a sequential classification setting. We always use a combination of the training data from every language provided to train our multilingual models. We found that larger models seem to work better and entity knowledge helps but at a non-negligible cost. For subtask-2, we only submitted an mLUKE-base system for sentence classification. For subtask-4, we only submitted an XLM-RoBERTa-base for token classification system for sequence labeling. For automatically replicating manually created event datasets, we participated in COVID-related protest events from the New York Times news corpus. We created a system to process the crawled data into a dataset of protest events.
Anthology ID:
2022.case-1.21
Volume:
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Ali Hürriyetoğlu, Hristo Tanev, Vanni Zavarella, Erdem Yörük
Venue:
CASE
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
149–154
Language:
URL:
https://aclanthology.org/2022.case-1.21
DOI:
10.18653/v1/2022.case-1.21
Bibkey:
Cite (ACL):
Peratham Wiriyathammabhum. 2022. ClassBases at the CASE-2022 Multilingual Protest Event Detection Task: Multilingual Protest News Detection and Automatically Replicating Manually Created Event Datasets. In Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE), pages 149–154, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
ClassBases at the CASE-2022 Multilingual Protest Event Detection Task: Multilingual Protest News Detection and Automatically Replicating Manually Created Event Datasets (Wiriyathammabhum, CASE 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp22-frontmatter/2022.case-1.21.pdf
Video:
 https://preview.aclanthology.org/emnlp22-frontmatter/2022.case-1.21.mp4