Jakub Hejman
2026
Generative Multilingual Coreference Resolution at CRAC 2026
Jakub Hejman | Ondrej Prazak | Miloslav Konopík
Proceedings of the 2nd Joint Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences and Computational Models of Reference, Anaphora and Coreference (CODI-CRAC 2026)
Jakub Hejman | Ondrej Prazak | Miloslav Konopík
Proceedings of the 2nd Joint Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences and Computational Models of Reference, Anaphora and Coreference (CODI-CRAC 2026)
Participating again in this year’s edition of the CRAC shared task on coreference resolution, we present our upgraded system with an official uplift of 15.46 percentage points in CoNLL-U score. We incorporated the larger Gemma 3 27B IT model, joint pre-training, headword tagging, more efficient training and inference as well as a sliding window to achieve this result. Our system placed second in the LLM track and third overall with a primary score of 73.83. We reached the highest scores on two datasets. Finally, we compare specialized and general LLM approaches.
2025
Fine-Tuned Llama for Multilingual Text-to-Text Coreference Resolution
Jakub Hejman | Ondrej Prazak | Miloslav Konopík
Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference
Jakub Hejman | Ondrej Prazak | Miloslav Konopík
Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference
This paper describes our approach to the CRAC 2025 Shared Task on Multilingual Coreference Resolution. We compete in the LLM track, where the systems are limited to generative text-to-text approaches. Our system is based on Llama 3.1-8B, fine-tuned to tag the document with coreference annotations. We have made one significant modification to the text format provided by the organizers: The model relies on the syntactic head for mention span representation. Additionally, we use joint pre-training, and we train the model to generate empty nodes. We provide an in-depth analysis of the performance of our models, which reveals several implementation problems. Although our system ended up in last place, we achieved the best performance on 10 datasets out of 22 within the track. By fixing the discovered problems in the post-evaluation phase, we improved our results substantially, outperforming all the systems in the LLM track and even some unconstrained track systems.