TwiConv: A Coreference-annotated Corpus of Twitter Conversations

Berfin Aktaş, Annalena Kohnert


Abstract
This article introduces TwiConv, an English coreference-annotated corpus of microblog conversations from Twitter. We describe the corpus compilation process and the annotation scheme, and release the corpus publicly, along with this paper. We manually annotated nominal coreference in 1756 tweets arranged in 185 conversation threads. The annotation achieves satisfactory annotation agreement results. We also present a new method for mapping the tweet contents with distributed stand-off annotations, which can easily be adapted to different annotation tasks.
Anthology ID:
2020.crac-1.6
Volume:
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference
Month:
December
Year:
2020
Address:
Barcelona, Spain (online)
Editors:
Maciej Ogrodniczuk, Vincent Ng, Yulia Grishina, Sameer Pradhan
Venue:
CRAC
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
47–54
Language:
URL:
https://aclanthology.org/2020.crac-1.6
DOI:
Bibkey:
Cite (ACL):
Berfin Aktaş and Annalena Kohnert. 2020. TwiConv: A Coreference-annotated Corpus of Twitter Conversations. In Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference, pages 47–54, Barcelona, Spain (online). Association for Computational Linguistics.
Cite (Informal):
TwiConv: A Coreference-annotated Corpus of Twitter Conversations (Aktaş & Kohnert, CRAC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/proper-vol2-ingestion/2020.crac-1.6.pdf
Code
 berfingit/twiconv