Antoine Bourgois
2026
Closing the Gap at CRAC 2026: Two-Stage Adaptation for LLM-Based Multilingual Coreference Resolution
Antoine Bourgois | Olga Seminck | Thierry Poibeau
Proceedings of the 2nd Joint Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences and Computational Models of Reference, Anaphora and Coreference (CODI-CRAC 2026)
Antoine Bourgois | Olga Seminck | Thierry Poibeau
Proceedings of the 2nd Joint Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences and Computational Models of Reference, Anaphora and Coreference (CODI-CRAC 2026)
We present our submission to the LLM track of the 2026 Computational Models of Reference, Anaphora and Coreference (CRAC 2026) shared task. With an average CoNLL F1 score of 74.32 on the official test set, our system ranked first in the LLM track, and third overall. Our system is based on the Gemma-3-27b model, fine-tuned using a two-stage strategy with a multilingual base adapter followed by dataset-specific adapters. We represent mention spans by their headword using an XML-inspired format with local reindexing and annotate documents iteratively. These design choices proved effective across languages, document lengths, and annotation guidelines.
2025
The Elephant in the Coreference Room: Resolving Coreference in Full-Length French Fiction Works
Antoine Bourgois | Thierry Poibeau
Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference
Antoine Bourgois | Thierry Poibeau
Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference
While coreference resolution is attracting more interest than ever from computational literature researchers, representative datasets of fully annotated long documents remain surprisingly scarce. In this paper, we introduce a new annotated corpus of three full-length French novels, totaling over 285,000 tokens. Unlike previous datasets focused on shorter texts, our corpus addresses the challenges posed by long, complex literary works, enabling evaluation of coreference models in the context of long reference chains. We present a modular coreference resolution pipeline that allows for fine-grained error analysis. We show that our approach is competitive and scales effectively to long documents. Finally, we demonstrate its usefulness to infer the gender of fictional characters, showcasing its relevance for both literary analysis and downstream NLP tasks.
GLaRef@CRAC2025: Should we transform coreference resolution into a text generation task?
Olga Seminck | Antoine Bourgois | Yoann Dupont | Mathieu Dehouck | Marine Delaborde
Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference
Olga Seminck | Antoine Bourgois | Yoann Dupont | Mathieu Dehouck | Marine Delaborde
Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference
We present the submissions of our team to the Unconstrained and LLM tracks of the Computational Models of Reference, Anaphora and Coreference (CRAC2025) shared task, where we ended respectively in the fifth and the first place, but nevertheless with similar scores: average CoNLL-F1 scores of 61.57 and 62.96 on the test set, but with very large differences in computational cost. Indeed, the classical pair-wise resolution system submitted to the Unconstrained track obtained similar performance but with less than 10% of the computational cost. Reflecting on this fact, we point out problems that we ran into using generative AI to perform coreference resolution. We explain how the framework of text generation stands in the way of a reliable text-global coreference representation. Nonetheless, we realize there are many potential improvements of our LLM-system; we discuss them at the end of this article.