Moving on from OntoNotes: Coreference Resolution Model Transfer

Patrick Xia; Benjamin Van Durme

doi:10.18653/v1/2021.emnlp-main.425

Moving on from OntoNotes: Coreference Resolution Model Transfer

Abstract

Academic neural models for coreference resolution (coref) are typically trained on a single dataset, OntoNotes, and model improvements are benchmarked on that same dataset. However, real-world applications of coref depend on the annotation guidelines and the domain of the target dataset, which often differ from those of OntoNotes. We aim to quantify transferability of coref models based on the number of annotated documents available in the target dataset. We examine eleven target datasets and find that continued training is consistently effective and especially beneficial when there are few target documents. We establish new benchmarks across several datasets, including state-of-the-art results on PreCo.

Anthology ID:: 2021.emnlp-main.425
Volume:: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2021
Address:: Online and Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5241–5256
Language:
URL:: https://aclanthology.org/2021.emnlp-main.425
DOI:: 10.18653/v1/2021.emnlp-main.425
Bibkey:
Cite (ACL):: Patrick Xia and Benjamin Van Durme. 2021. Moving on from OntoNotes: Coreference Resolution Model Transfer. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5241–5256, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Moving on from OntoNotes: Coreference Resolution Model Transfer (Xia & Van Durme, EMNLP 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/emnlp-22-attachments/2021.emnlp-main.425.pdf
Video:: https://preview.aclanthology.org/emnlp-22-attachments/2021.emnlp-main.425.mp4
Code: additional community code
Data: GAP Coreference Dataset, PreCo

PDF Search Code Video