Civil Unrest on Twitter (CUT): A Dataset of Tweets to Support Research on Civil Unrest

Justin Sech, Alexandra DeLucia, Anna L. Buczak, Mark Dredze


Abstract
We present CUT, a dataset for studying Civil Unrest on Twitter. Our dataset includes 4,381 tweets related to civil unrest, hand-annotated with information related to the study of civil unrest discussion and events. Our dataset is drawn from 42 countries from 2014 to 2019. We present baseline systems trained on this data for the identification of tweets related to civil unrest. We include a discussion of ethical issues related to research on this topic.
Anthology ID:
2020.wnut-1.28
Volume:
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Month:
November
Year:
2020
Address:
Online
Venues:
EMNLP | WNUT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
215–221
Language:
URL:
https://aclanthology.org/2020.wnut-1.28
DOI:
10.18653/v1/2020.wnut-1.28
Bibkey:
Cite (ACL):
Justin Sech, Alexandra DeLucia, Anna L. Buczak, and Mark Dredze. 2020. Civil Unrest on Twitter (CUT): A Dataset of Tweets to Support Research on Civil Unrest. In Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), pages 215–221, Online. Association for Computational Linguistics.
Cite (Informal):
Civil Unrest on Twitter (CUT): A Dataset of Tweets to Support Research on Civil Unrest (Sech et al., WNUT 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2020.wnut-1.28.pdf
Optional supplementary material:
 2020.wnut-1.28.OptionalSupplementaryMaterial.pdf
Code
 aadelucia/jhu-cut