GOLEMcoref: A Multilingual Coreference Dataset of Fiction
Andreas Van Cranenburgh, Xiaoyan Yang, Alvanita, Cecilia Nicole Di Domenico, Maria Ferragud, Arianna Graciotti, Byungjun Kim, Seonyeong Park, Noa Visser Solissa, Xiaoyu Zhou, Federico Pianzola
Abstract
We present a multilingual coreference dataset of 827k tokens of fiction in 7 languages: Bahasa Indonesia, Chinese, Dutch, English, Italian, Korean, and Spanish. The dataset includes full stories of diverse lengths, ranging from 500 to 17k words. We discuss our annotation scheme focusing on characters and language-specific challenges we encountered. Finally we present evaluation results of a neural coreference system trained on our dataset. We show that jointly training a system across all languages provides a strong improvement over monolingually trained models. The dataset is available under a creative commons license in CoNLL-2012 and CorefUD format at https://github.com/GOLEM-lab/GOLEMcoref/- Anthology ID:
- 2026.acl-short.39
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 472–480
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-short.39/
- DOI:
- Cite (ACL):
- Andreas Van Cranenburgh, Xiaoyan Yang, Alvanita, Cecilia Nicole Di Domenico, Maria Ferragud, Arianna Graciotti, Byungjun Kim, Seonyeong Park, Noa Visser Solissa, Xiaoyu Zhou, and Federico Pianzola. 2026. GOLEMcoref: A Multilingual Coreference Dataset of Fiction. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 472–480, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- GOLEMcoref: A Multilingual Coreference Dataset of Fiction (Van Cranenburgh et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-short.39.pdf