Abstract
This paper accompanies our top-performing submission to the CASE 2021 shared task, which is hosted at the workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text. Subtasks 1 and 2 of Task 1 concern the classification of newspaper articles and sentences into “conflict” versus “not conflict”-related in four different languages. Our model performs competitively in both subtasks (up to 0.8662 macro F1), obtaining the highest score of all contributions for subtask 1 on Hindi articles (0.7877 macro F1). We describe all experiments conducted with the XLM-RoBERTa (XLM-R) model and report results obtained in each binary classification task. We propose supplementing the original training data with additional data on political conflict events. In addition, we provide an analysis of unigram probability estimates and geospatial references contained within the original training corpus.- Anthology ID:
- 2021.case-1.22
- Volume:
- Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Venue:
- CASE
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 171–178
- Language:
- URL:
- https://aclanthology.org/2021.case-1.22
- DOI:
- 10.18653/v1/2021.case-1.22
- Cite (ACL):
- Francesco Re, Daniel Vegh, Dennis Atzenhofer, and Niklas Stoehr. 2021. Team “DaDeFrNi” at CASE 2021 Task 1: Document and Sentence Classification for Protest Event Detection. In Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021), pages 171–178, Online. Association for Computational Linguistics.
- Cite (Informal):
- Team “DaDeFrNi” at CASE 2021 Task 1: Document and Sentence Classification for Protest Event Detection (Re et al., CASE 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.case-1.22.pdf