PetEVAL: A veterinary free text electronic health records benchmark

Sean Farrell, Alan Radford, Noura Al Moubayed, Peter-John Noble


Abstract
We introduce PetEVAL, the first benchmark dataset derived from real-world, free-text veterinary electronic health records (EHRs). PetEVAL comprises 17,600 professionally annotated EHRs from first-opinion veterinary practices across the UK, partitioned into training (11,000), evaluation (1,600), and test (5,000) sets with distinct clinic distributions to assess model generalisability. Each record is annotated with International Classification of Disease 11 (ICD-11) syndromic chapter labels (20,408 labels), disease Named Entity Recognition (NER) tags (429 labels), and anonymisation NER tags (8,244 labels). PetEVAL enables evaluating Natural Language Processing (NLP) tools across applications, including syndrome surveillance and disease outbreak detection. We implement a multistage anonymisation protocol, replacing identifiable information with clinically relevant pseudonyms while establishing the first definition of identifiers in veterinary free text. PetEVAL introduces three core tasks: syndromic classification, disease entity recognition, and anonymisation. We provide baseline results using BERT-base, PetBERT, and LLaMA 3.1 8B generative models. Our experiments demonstrate the unique challenges of veterinary text, showcasing the importance of domain-specific approaches. By fostering advancements in veterinary informatics and epidemiology, we envision PetEVAL catalysing innovations in veterinary care, animal health, and comparative biomedical research through access to real-world, annotated veterinary clinical data.
Anthology ID:
2025.bionlp-1.29
Volume:
ACL 2025
Month:
August
Year:
2025
Address:
Viena, Austria
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Junichi Tsujii
Venues:
BioNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
341–353
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-1.29/
DOI:
Bibkey:
Cite (ACL):
Sean Farrell, Alan Radford, Noura Al Moubayed, and Peter-John Noble. 2025. PetEVAL: A veterinary free text electronic health records benchmark. In ACL 2025, pages 341–353, Viena, Austria. Association for Computational Linguistics.
Cite (Informal):
PetEVAL: A veterinary free text electronic health records benchmark (Farrell et al., BioNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-1.29.pdf
Supplementarymaterial:
 2025.bionlp-1.29.SupplementaryMaterial.txt
Supplementarymaterial:
 2025.bionlp-1.29.SupplementaryMaterial.zip