Text Categorization for Conflict Event Annotation

Fredrik Olsson, Magnus Sahlgren, Fehmi ben Abdesslem, Ariel Ekgren, Kristine Eck


Abstract
We cast the problem of event annotation as one of text categorization, and compare state of the art text categorization techniques on event data produced within the Uppsala Conflict Data Program (UCDP). Annotating a single text involves assigning the labels pertaining to at least 17 distinct categorization tasks, e.g., who were the attacking organization, who was attacked, and where did the event take place. The text categorization techniques under scrutiny are a classical Bag-of-Words approach; character-based contextualized embeddings produced by ELMo; embeddings produced by the BERT base model, and a version of BERT base fine-tuned on UCDP data; and a pre-trained and fine-tuned classifier based on ULMFiT. The categorization tasks are very diverse in terms of the number of classes to predict as well as the skeweness of the distribution of classes. The categorization results exhibit a large variability across tasks, ranging from 30.3% to 99.8% F-score.
Anthology ID:
2020.aespen-1.5
Volume:
Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
AESPEN
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
19–25
Language:
English
URL:
https://aclanthology.org/2020.aespen-1.5
DOI:
Bibkey:
Cite (ACL):
Fredrik Olsson, Magnus Sahlgren, Fehmi ben Abdesslem, Ariel Ekgren, and Kristine Eck. 2020. Text Categorization for Conflict Event Annotation. In Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020, pages 19–25, Marseille, France. European Language Resources Association (ELRA).
Cite (Informal):
Text Categorization for Conflict Event Annotation (Olsson et al., AESPEN 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nodalida-main-page/2020.aespen-1.5.pdf