Robert Mercer

Papers on this page may belong to the following people: Robert E. Mercer (Univ. of Western Ontario), Robert L. Mercer (IBM)

2026

pdf bib abs

We present the CanSA system for the MedEx-ACT@ACL 2026 shared task, which requires extracting and classifying clinical decisions from ICU discharge summaries into nine DIC-TUM categories. We have developed three approaches: (1) a training-free system which consists of a preprocessing module that normalizes text and an inference engine combining zero shot LLMs with a RAG ensemble, (2) a supervised fine-tuning method which required training, and (3) a training-free retrieval-augmented pipeline employing TF–IDF-based lexical retrieval to surface in-context exemplars from the development corpus, combined with section aware chunking and structured extraction calls to a large language model. Our team’s best submission achieved a Final Score of 0.41, ranking 34th out of 37 on the official test leaderboard.

2024

pdf bib abs

Anisotropy is Not Inherent to Transformers
Anemily Machina | Robert Mercer
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Isotropy is the property that embeddings are uniformly distributed around the origin. Previous work has shown that Transformer embedding spaces are anisotropic, which is called the representation degradation problem. This degradation has been assumed to be inherent to the standard language modeling tasks and to apply to all Transformer models regardless of their architecture. In this work we identify a set of Transformer models with isotropic embedding spaces, the large Pythia models. We examine the isotropy of Pythia models and explore how isotropy and anisotropy develop as a model is trained. We find that anisotropic models do not develop as previously theorized, using our own analysis to show that the large Pythia models optimize their final Layer Norm for isotropy, and provide reasoning why previous theoretical justifications for anisotropy were insufficient. The identification of a set of isotropic Transformer models calls previous assumptions into question, provides a set of models to contrast existing analysis, and should lead to deeper insight into isotropy.

pdf bib abs

Multi-stage Retrieve and Re-rank Model for Automatic Medical Coding Recommendation
Xindi Wang | Robert Mercer | Frank Rudzicz
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

The International Classification of Diseases (ICD) serves as a definitive medical classification system encompassing a wide range of diseases and conditions. The primary objective of ICD indexing is to allocate a subset of ICD codes to a medical record, which facilitates standardized documentation and management of various health conditions. Most existing approaches have suffered from selecting the proper label subsets from an extremely large ICD collection with a heavy long-tailed label distribution. In this paper, we leverage a multi-stage “retrieve and re-rank” framework as a novel solution to ICD indexing, via a hybrid discrete retrieval method, and re-rank retrieved candidates with contrastive learning that allows the model to make more accurate predictions from a simplified label space. The retrieval model is a hybrid of auxiliary knowledge of the electronic health records (EHR) and a discrete retrieval method (BM25), which efficiently collects high-quality candidates. In the last stage, we propose a label co-occurrence guided contrastive re-ranking model, which re-ranks the candidate labels by pulling together the clinical notes with positive ICD codes. Experimental results show the proposed method achieves state-of-the-art performance on a number of measures on the MIMIC-III benchmark.

Co-authors

Yetian Wang 1

Venues

Fix author