Gauher Shaheen


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2024

pdf bib
Large Language Models Encode the Practice of Medicine
Teja Kanchinadam | Gauher Shaheen
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)

Healthcare tasks such as predicting clinical outcomes across medical and surgical populations, disease prediction, predicting patient health journeys, are typically approached with supervised learning on task-specific datasets. We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of billions of administrative claims, which essentially encapsulates the practice of medicine, offering a unique perspective on patient care and treatment patterns. Our model, MediClaimGPT, a 125M parameter Transformer demonstrates strong zero-shot predictive capabilities, accurately forecasting patient health events across four evaluation datasets, with its capabilities further demonstrated in various downstream tasks. A significant application of MediClaimGPT is in generating high-quality, clinically plausible synthetic claims data, enhancing healthcare data utility while preserving patient privacy. This research underscores the potential of language models in handling complex datasets and their strategic application in healthcare and related fields.