Abstract
This article introduces TwiConv, an English coreference-annotated corpus of microblog conversations from Twitter. We describe the corpus compilation process and the annotation scheme, and release the corpus publicly, along with this paper. We manually annotated nominal coreference in 1756 tweets arranged in 185 conversation threads. The annotation achieves satisfactory annotation agreement results. We also present a new method for mapping the tweet contents with distributed stand-off annotations, which can easily be adapted to different annotation tasks.- Anthology ID:
- 2020.crac-1.6
- Volume:
- Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (online)
- Editors:
- Maciej Ogrodniczuk, Vincent Ng, Yulia Grishina, Sameer Pradhan
- Venue:
- CRAC
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 47–54
- Language:
- URL:
- https://aclanthology.org/2020.crac-1.6
- DOI:
- Cite (ACL):
- Berfin Aktaş and Annalena Kohnert. 2020. TwiConv: A Coreference-annotated Corpus of Twitter Conversations. In Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference, pages 47–54, Barcelona, Spain (online). Association for Computational Linguistics.
- Cite (Informal):
- TwiConv: A Coreference-annotated Corpus of Twitter Conversations (Aktaş & Kohnert, CRAC 2020)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2020.crac-1.6.pdf
- Code
- berfingit/twiconv