William Boag


Publicly Available Clinical BERT Embeddings
Emily Alsentzer | John Murphy | William Boag | Wei-Hung Weng | Di Jindi | Tristan Naumann | Matthew McDermott
Proceedings of the 2nd Clinical Natural Language Processing Workshop

Contextual word embedding models such as ELMo and BERT have dramatically improved performance for many natural language processing (NLP) tasks in recent months. However, these models have been minimally explored on specialty corpora, such as clinical text; moreover, in the clinical domain, no publicly-available pre-trained BERT models yet exist. In this work, we address this need by exploring and releasing BERT models for clinical text: one for generic clinical text and another for discharge summaries specifically. We demonstrate that using a domain-specific model yields performance improvements on 3/5 clinical NLP tasks, establishing a new state-of-the-art on the MedNLI dataset. We find that these domain-specific models are not as performant on 2 clinical de-identification tasks, and argue that this is a natural consequence of the differences between de-identified source text and synthetically non de-identified task text.


MUTT: Metric Unit TesTing for Language Generation Tasks
William Boag | Renan Campos | Kate Saenko | Anna Rumshisky
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

SimiHawk at SemEval-2016 Task 1: A Deep Ensemble System for Semantic Textual Similarity
Peter Potash | William Boag | Alexey Romanov | Vasili Ramanishka | Anna Rumshisky
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)


TwitterHawk: A Feature Bucket Based Approach to Sentiment Analysis
William Boag | Peter Potash | Anna Rumshisky
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)