Zero-Shot ATC Coding with Large Language Models for Clinical Assessments
Zijian Chen, John-Michael Gamble, Micaela Jantzi, John P. Hirdes, Jimmy Lin
Abstract
Manual assignment of Anatomical Therapeutic Chemical (ATC) codes to prescription records is a significant bottleneck in healthcare research and operations at Ontario Health and InterRAI Canada, requiring extensive expert time and effort. To automate this process while maintaining data privacy, we develop a practical approach using locally deployable large language models (LLMs). Inspired by recent advances in automatic International Classification of Diseases (ICD) coding, our method frames ATC coding as a hierarchical information extraction task, guiding LLMs through the ATC ontology level by level. We evaluate our approach using GPT-4o as an accuracy ceiling and focus development on open-source Llama models suitable for privacy-sensitive deployment. Testing across Health Canada drug product data, the RABBITS benchmark, and real clinical notes from Ontario Health, our method achieves 78% exact match accuracy with GPT-4o and 60% with Llama 3.1 70B. We investigate knowledge grounding through drug definitions, finding modest improvements in accuracy. Further, we show that fine-tuned Llama 3.1 8B matches zero-shot Llama 3.1 70B accuracy, suggesting that effective ATC coding is feasible with smaller models. Our results demonstrate the feasibility of automatic ATC coding in privacy-sensitive healthcare environments, providing a foundation for future deployments.- Anthology ID:
- 2025.naacl-industry.19
- Volume:
- Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)
- Month:
- April
- Year:
- 2025
- Address:
- Albuquerque, New Mexico
- Editors:
- Weizhu Chen, Yi Yang, Mohammad Kachuee, Xue-Yong Fu
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 226–232
- Language:
- URL:
- https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.naacl-industry.19/
- DOI:
- Cite (ACL):
- Zijian Chen, John-Michael Gamble, Micaela Jantzi, John P. Hirdes, and Jimmy Lin. 2025. Zero-Shot ATC Coding with Large Language Models for Clinical Assessments. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track), pages 226–232, Albuquerque, New Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Zero-Shot ATC Coding with Large Language Models for Clinical Assessments (Chen et al., NAACL 2025)
- PDF:
- https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.naacl-industry.19.pdf