Developing A Multilabel Corpus for the Quality Assessment of Online Political Talk

Kokil Jaidka


Abstract
This paper motivates and presents the Twitter Deliberative Politics dataset, a corpus of political tweets labeled for its deliberative characteristics. The corpus was randomly sampled from replies to US congressmen and women. It is expected to be useful to a general community of computational linguists, political scientists, and social scientists interested in the study of online political expression, computer-mediated communication, and political deliberation. The data sampling and annotation methods are discussed and classical machine learning approaches are evaluated for their predictive performance on the different deliberative facets. The paper concludes with a discussion of future work aimed at developing dictionaries for the quality assessment of online political talk in English. The dataset and a demo dashboard are available at https://github.com/kj2013/twitter-deliberative-politics.
Anthology ID:
2022.lrec-1.589
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5503–5510
Language:
URL:
https://aclanthology.org/2022.lrec-1.589
DOI:
Bibkey:
Cite (ACL):
Kokil Jaidka. 2022. Developing A Multilabel Corpus for the Quality Assessment of Online Political Talk. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5503–5510, Marseille, France. European Language Resources Association.
Cite (Informal):
Developing A Multilabel Corpus for the Quality Assessment of Online Political Talk (Jaidka, LREC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/improve-issue-templates/2022.lrec-1.589.pdf