Tristan Naumann


2021

pdf bib
Modular Self-Supervision for Document-Level Relation Extraction
Sheng Zhang | Cliff Wong | Naoto Usuyama | Sarthak Jain | Tristan Naumann | Hoifung Poon
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Extracting relations across large text spans has been relatively underexplored in NLP, but it is particularly important for high-value domains such as biomedicine, where obtaining high recall of the latest findings is crucial for practical applications. Compared to conventional information extraction confined to short text spans, document-level relation extraction faces additional challenges in both inference and learning. Given longer text spans, state-of-the-art neural architectures are less effective and task-specific self-supervision such as distant supervision becomes very noisy. In this paper, we propose decomposing document-level relation extraction into relation detection and argument resolution, taking inspiration from Davidsonian semantics. This enables us to incorporate explicit discourse modeling and leverage modular self-supervision for each sub-problem, which is less noise-prone and can be further refined end-to-end via variational EM. We conduct a thorough evaluation in biomedical machine reading for precision oncology, where cross-paragraph relation mentions are prevalent. Our method outperforms prior state of the art, such as multi-scale learning and graph neural networks, by over 20 absolute F1 points. The gain is particularly pronounced among the most challenging relation instances whose arguments never co-occur in a paragraph.

2020

pdf bib
Proceedings of the 3rd Clinical Natural Language Processing Workshop
Anna Rumshisky | Kirk Roberts | Steven Bethard | Tristan Naumann
Proceedings of the 3rd Clinical Natural Language Processing Workshop

2019

pdf bib
Proceedings of the 2nd Clinical Natural Language Processing Workshop
Anna Rumshisky | Kirk Roberts | Steven Bethard | Tristan Naumann
Proceedings of the 2nd Clinical Natural Language Processing Workshop

pdf bib
Publicly Available Clinical BERT Embeddings
Emily Alsentzer | John Murphy | William Boag | Wei-Hung Weng | Di Jindi | Tristan Naumann | Matthew McDermott
Proceedings of the 2nd Clinical Natural Language Processing Workshop

Contextual word embedding models such as ELMo and BERT have dramatically improved performance for many natural language processing (NLP) tasks in recent months. However, these models have been minimally explored on specialty corpora, such as clinical text; moreover, in the clinical domain, no publicly-available pre-trained BERT models yet exist. In this work, we address this need by exploring and releasing BERT models for clinical text: one for generic clinical text and another for discharge summaries specifically. We demonstrate that using a domain-specific model yields performance improvements on 3/5 clinical NLP tasks, establishing a new state-of-the-art on the MedNLI dataset. We find that these domain-specific models are not as performant on 2 clinical de-identification tasks, and argue that this is a natural consequence of the differences between de-identified source text and synthetically non de-identified task text.

2016

pdf bib
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)
Anna Rumshisky | Kirk Roberts | Steven Bethard | Tristan Naumann
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)