A Domain-Specific Curated Benchmark for Entity and Document-Level Relation Extraction

Marco Martinelli, Stefano Marchesin, Vanessa Bonato, Giorgio Di Nunzio, Nicola Ferro, Ornella Irrera, Laura Menotti, Federica Vezzani, Gianmaria Silvello


Abstract
Information Extraction (IE), encompassing Named Entity Recognition (NER), Named Entity Linking (NEL), and Relation Extraction (RE), is critical for transforming the rapidly growing volume of scientific publications into structured, actionable knowledge. This need is especially evident in fast-evolving biomedical fields such as the gut-brain axis, where research investigates complex interactions between the gut microbiota and brain-related disorders. Existing biomedical IE benchmarks, however, are often narrow in scope and rely heavily on distantly supervised or automatically generated annotations, limiting their utility for advancing robust IE methods. We introduce GutBrainIE, a benchmark based on more than 1,600 PubMed abstracts, manually annotated by biomedical and terminological experts with fine-grained entities, concept-level links, and relations. While grounded in the gut-brain axis, the benchmark’s rich schema, multiple tasks, and combination of highly curated and weakly supervised data make it broadly applicable to the development and evaluation of biomedical IE systems across domains.
Anthology ID:
2026.findings-eacl.301
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5693–5711
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.301/
DOI:
Bibkey:
Cite (ACL):
Marco Martinelli, Stefano Marchesin, Vanessa Bonato, Giorgio Di Nunzio, Nicola Ferro, Ornella Irrera, Laura Menotti, Federica Vezzani, and Gianmaria Silvello. 2026. A Domain-Specific Curated Benchmark for Entity and Document-Level Relation Extraction. In Findings of the Association for Computational Linguistics: EACL 2026, pages 5693–5711, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
A Domain-Specific Curated Benchmark for Entity and Document-Level Relation Extraction (Martinelli et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.301.pdf
Checklist:
 2026.findings-eacl.301.checklist.pdf