Islam Nassar
2025
Taming the Real-world Complexities in CPT E/M Coding with Large Language Models
Islam Nassar
|
Yang Lin
|
Yuan Jin
|
Rongxin Zhu
|
Chang Wei Tan
|
Zenan Zhai
|
Nitika Mathur
|
Thanh Tien Vu
|
Xu Zhong
|
Long Duong
|
Yuan-Fang Li
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Evaluation and Management (E/M) coding, under the Current Procedural Terminology (CPT) taxonomy, documents medical services provided to patients by physicians. Used primarily for billing purposes, it is in physicians’ best interest to provide accurate CPT E/M codes. Automating this coding task will help alleviate physicians’ documentation burden, improve billing efficiency, and ultimately enable better patient care. However, a number of real-world complexities have made E/M encoding automation a challenging task. In this paper, we elaborate some of the key complexities and present ProFees, our LLM-based framework that tackles them, followed by a systematic evaluation. On an expert-curated real-world dataset, ProFees achieves an increase in coding accuracy of more than 36% over a commercial CPT E/M coding system and almost 5% over our strongest single-prompt baseline, demonstrating its effectiveness in addressing the real-world complexities.
2022
Generate, Annotate, and Learn: NLP with Synthetic Text
Xuanli He
|
Islam Nassar
|
Jamie Kiros
|
Gholamreza Haffari
|
Mohammad Norouzi
Transactions of the Association for Computational Linguistics, Volume 10
This paper studies the use of language models as a source of synthetic unlabeled text for NLP. We formulate a general framework called “generate, annotate, and learn (GAL)” to take advantage of synthetic text within knowledge distillation, self-training, and few-shot learning applications. To generate high-quality task-specific text, we either fine-tune LMs on inputs from the task of interest, or prompt large LMs with few examples. We use the best available classifier to annotate synthetic text with soft pseudo labels for knowledge distillation and self-training, and use LMs to obtain hard labels for few-shot learning. We train new supervised models on the combination of labeled and pseudo-labeled data, which results in significant gains across several applications. We investigate key components of GAL and present theoretical and empirical arguments against the use of class-conditional LMs to generate synthetic labeled text instead of unlabeled text. GAL achieves new state-of-the-art knowledge distillation results for 6-layer transformers on the GLUE leaderboard.
2019
Neural Versus Non-Neural Text Simplification: A Case Study
Islam Nassar
|
Michelle Ananda-Rajah
|
Gholamreza Haffari
Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association
Search
Fix author
Co-authors
- Gholamreza Haffari 2
- Michelle Ananda-Rajah 1
- Long Duong 1
- Xuanli He 1
- Yuan Jin 1
- show all...