Lightweight Domain-Specific Language Model for Real-Time Structuring of Medical Prescriptions

Jonathan Pattin Cottet, Véronique Eglin, Alex Aussem


Abstract
Automated structuring of medical prescriptions is critical for downstream safety checks in pharmacies, yet remains challenging due to heterogeneous layouts, OCR noise, and dense clinical abbreviations in real-world documents. Existing language models either ignore layout information, rely on computationally expensive image-based architectures, or cannot operate under strict privacy and hardware constraints such as GDPR and HDS-certified environments.We present a lightweight (<10M parameters), privacy-preserving transformer specifically designed for Entity Extraction (EE) and Entity Linking (EL) in French medical prescriptions. The model uses only OCR text and normalized 2D word coordinates, enabling robust pseudonymisation and real-time CPU-level inference while preserving essential spatial cues. It is pretrained on a large corpus of pseudonymised OCR outputs using objectives tailored to prescription structure, including a novel Token-to-Line Alignment (TLA) task, and fine-tuned on the Rx-PAD dataset (Pattin Cottet et al., 2025).Empirical results show that our approach matches or surpasses larger document-understanding models and rivals multimodal LLMs on strict extraction metrics, while achieving sub-second latency suitable for operational deployment. The system is currently used in 230 pharmacies, demonstrating both scalability and practical relevance. These findings highlight the importance of specialized, domain-aware, lightweight models for safe, efficient, and legally compliant prescription verification.
Anthology ID:
2026.eacl-industry.68
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Yevgen Matusevych, Gülşen Eryiğit, Nikolaos Aletras
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
915–926
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-industry.68/
DOI:
Bibkey:
Cite (ACL):
Jonathan Pattin Cottet, Véronique Eglin, and Alex Aussem. 2026. Lightweight Domain-Specific Language Model for Real-Time Structuring of Medical Prescriptions. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track), pages 915–926, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Lightweight Domain-Specific Language Model for Real-Time Structuring of Medical Prescriptions (Cottet et al., EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-industry.68.pdf